DOCUMENT RESUME 



ED 037 476 



24 



TE 500 596 



AUTHOR 

TITLE 

INSTITUTION 
SPONS AGENCY 

BUREAU NO 
PUB DATE 
CONTRACT 
NOTE 



Jewell, Ross M.; And Others 

The Effectiveness of College-Level Instruction in 
Freshman Composition. Final Report. 

Northern Iowa Univ., Ceder Falls. 

Office of Education (DHEW) , Washington, D.C. Bureau 
of Research. 

BR-5-0803 

69 

OEC- S A E-0E-4- 10-053 
27 3p. 



EDRS PRICE 
DESCRIPTORS 



EDRS Price MF-S1.25 HC^$13.75 

Academic Performance, ^College Freshmen, Composition 
(Literary), ^Composition Skills (Literary), Computer 
Assisted Instruction, English, English Education, 
^English Instruction, Experiments, Performance 
Tests, ^Research, Sex (Characteristics), 
^Standardized Tests, Statistical Data, Tables 
(Data) , Verbal Tests 



of 



ABSTRACT 

This final report of a two^stage project describes 
an effort to determine whether students receiving instruction in 
freshman English composition perform better on standardized tests 
than students who do not receive similar instruction, when both 
groups are in college the same length of time. The second phase of 
the experiment detailed in the report involves 1,040 matched pairs 
students from the University of Northern Iowa, the University of 
Iowa, Kent State University, the University of Colorado, and Northern 
Illinois University. Using the Cooperative English Tests: English 
Expression (COOP) , the College Entrance Examination Board (CEEB) 
English Composition Test, and a theme as test instruments, the 
authors include computer-generated test results covering frequent 
intervals over the 2-year period on: (1) overall performance, (2) 

performance by ability quarters by sex, and (3) performance by sex. 

procedures, summary, and recommendations are included., 
statistical tables reveal performance data. (RL) 
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FOREWORD 



This is the final report of Research Project 2188 amended 
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Nelson. He should not be held responsible for any which remain. 
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Tc^n n ' frsM 1 « eS -- — , poUege-Level Instruction in Freshman Composi tion 
(Cooperative Research Project 2188) , Cedar FalliT Iowa: St ltT 

of Northern ?owa in 6 196? The St3te C ° Ue8e ° f l0 “ a beCame the diversity 
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SUMMARY 



This is the Final Report of Research Project 2188 as amended. 
The hypothesis involved was that on tests related to writing, 
performance of students receiving instruction in freshman English 
composition does not differ significantly from the performance of 
similar students not receiving instruction in freshman English 
composition, when both groups have been in college the same length 
of time. The research was conducted in two stages. The first 
stage, at the University of Northern Iowa, began with the fall 
semester, 1963, and concluded with the spring semester, 1965. The 
first phase was reported in an Interim Report . The second stage, 
in which the University of Northern Iowa was joined by the 
University- of Iowa, Kent State University, the University of 
Colorado, and Northern Illinois University, began in September 1964 
and ended in May 1966. The present report concerns the second stage. 

In the fall of 1964 the basic pool of 4,190 freshman students 
from five institutions combined were subdivided randomly, within 
sex and ACT English score, into experimentals (N=l,408) and controls 
(N*2,782). The experimental students did not enroll in freshman 
composition courses; the control students did. Following testing 
at the beginning of the fall semester of 1964, 1,040 matched pairs 
of students were formed on the basis of sex, age, and scores on the 
Cooperative English Tests: English Expression (COOP), College 

Entrance Examination Board English Composition Test (CEEB) , and a 
theme. The three tests were again administered at the end of the 
first semester, second semester, and fourth semester. Numbers of 
fully described matched pairs who persisted were, respectively, 597, 
365, and 122. 

Results partially confirmed and partially denied the hypo- 
thesis. Of nine main comparisons — COOP, CEEB, and theme at the end 
of the first semester, the end of the second semester, and the end 
of the fourth semester — the null hypothesis was denied on three, 
the control students performing significantly better than the experi- 
mentals on COOP and theme at the end of the first semester, and on 
COOP at the end of the second semester. 

Test scores were also analyzed in terms of sex and ability 
level of students. Females performed consistently better than 
males, particularly at the lowest one-quarter of ability. Teachers 
and researchers should not overlook this differential between the 
sexes in performance on tests related to composition. It was also 
at the lowest ability level that there were the strongest indications 
of some superiority of the control subgroup over the experimental 
subgroup. 
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Data were also analyzed for each of the three criterion 
measures for the 122 pairs who were available at the final testing 
date. In this analysis — using a constant N of 122 Instead of 
diminishing N's (597, 365, 122) — none of the nine comparisons 
between experimental subgroup and control subgroup yielded a 
significant difference In means. 
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BACKGROUND 



Statement of the Problem 



Research In college composition has not been plentiful, and 
most of the studies reported have concentrated on comparing some 
Innovation with a standard procedure. Variables have ranged from the 
number of student papers written through the amount of teacher comment 
on each paper to the Influence of such subjects as rhetoric and grammar 
on the performance of the student. In every case the other element In 
the comparison was the particular arrangement of freshman composition 
at the Institution In which the research was done. Seldom has a 
statistically significant difference appeared, and the difficulty is 
that, even where It has, the difference has been between a particular 
Innovation and what might be termed standard procedure. A tacit assumption 
In such research has been that the "standard" course Improved student 
writing and the question was whether the innovation would produce a 
result different from that produced by the standard course. These 
Investigations seldom Included comparisons of the results with an 
arrangement involving no formal Instruction In English composition. 

A second difficulty with the research reported has been that 
the statistical comparisons involved a relatively small number of 
students. The question is always present as to whether the sample 
employed Is sufficiently large and broadly based to be reasonably 
representative of a given group — for example, all entering college freshmen 
in a substantial number of American colleges. In those few Instances in 
which a statistically significant difference has been found, the degree 
to which generalizations beyond the samples Investigated may be made is 
uncertain. 

The present Investigators decided to attempt' to overcome both 
of these deficiencies. They planned to compare students who had 
received no Instruction of the sort generally given In freshman 
composition with comparable students who had received such instruction. 

In order to develop statistics for a reasonably broad and a reasonably 
diverse population, they planned to engage several Institutions In 
replicating the experiment. This procedure would give a numerical, 
geographical, and academic variety to the population. If the results 
In the participating Institutions were In substantial agreement, the 
conclusions could be stated with considerable force. 

The goals of the Investigation, then, were to test two hypo- 
theses: 

(1) That the writing performance of the students enrolled 
In a freshman composition sequence Is not significantly 
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different from the writing performance of students not 
enrolled in a freshman composition sequence when the 
two groups have been in college for an equal length of 
time. 

(2) That the results obtained in (1) will be present in 
many colleges or universities. 

A by-product of the testing of the hypotheses would be the 
accumulation of statistics based upon a reasonably large and diverse 
sample of students who had received no instruction in college 
freshman composition. Such a set of statistics might prove useful in 
providing a realistic and stable base for investigating the effect of 
innovation as well as of the "standard" course itself. Meaningful use 
of these statistics could be made only if the investigators testing 
an innovation utilized the evaluative instruments employed in the 
present investigation. 

The investigation was divided into a pilot phase, conducted 
at the University of Northern Iowa* from 1963 to 1965, and a major 
phase which ran from September 1964 through May 1966. The major phase 
is reported in this document. The results of the pilot study are 
available in the interim report. Procedures followed in the pilot 
phase were replicated at the University of Northern Iowa and at four 
other universities: the University of Colorado, the University of 

Iowa, Kent State University, and Northern Illinois University. Each 
of these institutions has been assigned, randomly, a number from one to 
five. Future references will be by these numbers and not by the names 
of the institutions. 



Selection of Cooperating Universities 



The United States Office of Education authorized a total of 
six institutions in the experiment. The investigators originally 
intended to include institutions that were varied in size and type: 
large, small, liberal arts, engineering, private, public, and so 
forth. Since a freshman class of close to 1,100 would be necessary 
to ensure sufficient retention, after two years, the total enrollment 
of each institution had to exceed 4,000. Some effort was made to 
include geographical distribution also. Time became a factor in the 
selection because the project was approved in April 1963, and the 
schools had to be selected in the fall of 1963, in order to enable 
them and the investigators to plan adequately for the start of the 
project in the fall of 1964. 



*0n July 1, 1967, the State College of Iowa became the 
University of Northern Iowa. Thus the whole investigation was 
completed while the school was called a college, while the report is 
being written under the new name. University of Northern Iowa will 
be used throughout. 
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Of the thirty schools that responded to the first letter, 
only five remained after the second round of correspondence, all 
public universities: the University of Colorado, the University 

of Iowa, Northern Illinois University, and Kent State University.* 



Composition Programs in Participating Universities 

Figures 1 and 2 present data concerning these participating 
institutions and their freshman composition programs. Institutions 
2-5 were of similar size in 1964-65, all being more than twice the 
size of University 1. Total freshman composition enrollment, fall, 
1964, ranged from about 1,100 to about 4,111. Men constituted almost 
exactly one-half of the freshman enrollment at four institutions, but 
only 38 percent at the fifth. Graduate assistants were used for 
instruction at 3 of the 5 institutions. Teaching loads for full-time 
staff varied from 9 to 12 hours and class size from 22 to 30 etudents. 
Some form of exemption from composition, and some method of optional, 
outside-of-class help were available at all institutions. Across- 
the-board class sectioning was the practice at one institution, sec- 
tioning for only high students at two, and no sectioning at two. 

Credit allowances for composition varied from 5 to 8 semester hours. 

There was variety in the content of the programs. All empha- 
sized exposition in the first semester or quarter, one emphasizing 
it for the year. Other common emphases the first semester were 
organization, central idea, and sentence structure. One institution 
included argument the first semester. Variety was greater in the 
second semester or quarter than in the first: the one institution 

continued exposition, two stressed argument, one imaginative writing, 
and one literary analysis; three included some literature during the 
year, two did not. Three institutions required research papers in 
the second semester or third quarter, two did not. 

There were differences in matters other than content. The 
number of themes for the year varied from 16 to 22. The number of 
in-class themes varied from 2 to 8, with one university not reporting 
that item. Data concerning theme length were incomplete. Average 
theme lengths reported for the first semester were 300 to 500 words, 
for the second semester 400 to 950 (the latter including a 2,000-word 
research paper), and for the third term 1,250, including a research 
paper . 



*The sudden death of Dr. Herbert Hackett of the State University 
of New York at Buffalo after the project was underway led to the 
elimination of that institution. 
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Institution 





One 


Two 


Three 


Four 


Five : 

* 


Total Enrollment 


5,519 


13,380 


13,252 


12,672 


14,480 ] 


Total Freshman 
Enrollment Sept. 1964 


1,914 


2,800 


4,563 


4,842 


p \ 

3,171 

1 


Total Freshman 
Enrollment May 1965 


1,649 


2,700 


4,400 


3,849 


2,974 


Approximate Percent 
of Men Freshmen, 
September 1964 


38 


50 


50 


50 


53 ) 


Number of Instructors 
in Freshman Composition 
Full Time 


14+ 


20 


48+ 


41 


12 ; 


Number of Graduate 
Assistants as Instructors 
in Freshman Composition 


0 


23 


8 


0 


84 j 


Normal Teaching Load 


9 hrs 


. 10-12 hrs. 


10-12 hrs. 9 hrs 


. 10-12 hrs. 

A 

3 


Average Class Size 


30 


25-26 


25-30 


27 


22 : 


Exemption from Freshman 
Composition Possible? 


Yes 


Yes 


Yes 


Yes 


Yes i 

{ 


Classes Sectioned 
by Ability? 


High 

only 


No 


No 


High 

only 


Yes* 


Optional Outside-of- 
Class Help Available? 


Yes 


Yes 


Yes 


Yes 


Yes ] 


+Two part-time instructors 


also. 











* Sections determined by Placement Test Scores. 



FIGURE 1 

GENERAL INFORMATION ON 1964-65 ENROLLMENT AND FRESHMAN 
COMPOSITION CLASSES FOR EACH INSTITUTION 




Institution 



" 1 r • ■ > 





One Two 


Three 


Four 


Five 


Length 
of Program 


2 sem. 2 sem. 


3 qr. 


2 sem. 


2 sem. 


Credits 


5 sem. 6 sem. 


9 qr. 


8 sem. 


8 sem. 


Emphasis - 
Term I 


Exposition, Reading & Exposition, 

Development , Writing Opinion, 

Sentence Expository Argument, 

Structure, Prose, Organization 

Conventions , Organization, Clarity, 
Lang. Study , Sentence Precision. 

Organization .Structure, 

Central Idea. 


Exposition, a 
Central Idea 

> 


Speaking, 
.Reading, 
Writing, 
Listening, 
Written & 
Oral 

Exposition. 


Emphasis - 
Term II 


Exposition, Argument, 
Effective- Logic, 
ness & Style, Rhetoric. 
Semantics. 


Imaginative 
& Emotional 
Writing. 


Literary 

Analysis 

Research 

Paper. 


Reasoning, 
Argument , 
Criticism, 
Research. 


Emphasis - 
Term III 




Research, 

Literary 

Analysis. 






No. of Themes 
Term I 


10 10 


6 


10 


ll c 


No. of Themes 
Term II 


8 8 d 


6 


7 


ll c 


No. of Themes 
Term III 




4 






No. of In- 
class Themes 
Term I 


2 0 


2/3 


5 




No. of In- 
class Themes 
Term II 


1 2 


2/3 


3 





a Flnal grade determined by 500-word theme and objective test. 
^Flnal grade based 2/3 on writing, 1/3 on literary analysis. 
c Elght speeches In addition. 
d One must be revised. 



FIGURE 2 

COMPOSITION PROGRAMS AT PARTICIPATING INSTITUTIONS, 1964-65 
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No. of In- 
class Themes 

Term III 0 0 0 0 0 

Theme Length 



Term I 


300 words 




500 words 






Theme Length 
Term II 


400 words 


6-8 pp. 


667 words 


950 words e 




Theme Length 
Term III 






1,250 words 






Research Paper 
Required 


No 


No 


Yes 


Yes 


Yes 


Literature 
Term I 


No 


No 


No 




No 


Literature 
Term II 


No 


No 


Yes 


Yes 


No 


Literature 
Term III 






Yes 







including a 2,000-word research paper. 



FIGURE 2 
CONTINUED 








Special mention should be made of the program at one 
Institution — a communications approach, combining reading, writing, 
speaking, and listening. Eleven themes and eight speeches were required 
each semester. There was emphasis on exposition and argument, a research 
paper was required but no ntudy of literature was Included. This program 
differed materially from all the others. 

Therefore, several major types of composition programs were 
Included In this study: communications approach, stress on exposition 

only, stress on exposition and argument, and stress on exposition and 
literary analysis, some with and some without research papers. The 
programs Involved are representative of those at many state universities 
requiring composition. 



Related Research 



No research has come to the Investigators' attention which Is 
directly comparable to the present study. Nearly all the research 
compares some Innovation with a standard procedure. Such studies 
ordinarily vary the frequency of writing In the composition course 
as the experimental variable. Most of these obtained no statistically 
significant differences In the performance of the groups of students 
at the end of Instruction. A summary of projects with some relevance 
to the current study Is given below. 



Arnold, Lois. Effects of Frequency of Writing and Intensity of 

Evaluation upon Performance in Written Composition of Tenth 
Grade Students (Cooperative Research Project Number 1523), 
Tallahassee: Florida State University, 1963, University 

Microfilms No. 63-6344 . 

Miss Arnold conducted her research In 1961-1962 at two 
Florida high schools, in each of which a teacher was scheduled to 
teach four groups of students In the tenth grade. The four 
groups at each school were average classes, determined by sec- 
tioning on the basis of scores on the following tests: Pintner 

General Ability Test, Metropolitan Achievement Battery . School 
and College Ability Test, and Differential Aptitude Tests . 

Students were classified as low average, middle average, or high 
average on the basis of the DAT scores. Nothing Is said of 
student-to-student matching. The experiment lasted for the 
school year. Each teacher at each school used four teaching 
methods, a different one for each of her four classes as follows: 



1. Infrequent writing, moderate evaluation: one theme, 

approximately 250 words, each six weeks. Evaluation was 
concentrated on one matter each time: once on sentence 

structure, once on organization, etc. 

2. Frequent writing, moderate evaluation: some writing four 

times a week, varying from two sentences to two pages or 
more. The evaluation was handled as In 1 above. 

3. Infrequent writing, Intensive evaluation: one theme each 

six weeks, approximately 250 words. Every error In 
usage, sentence structure, and mechanics was marked and 
detailed comments written on the paper. Students cor- 
rected all errors, revised or rewrote until the paper 
was satisfactory. 

4. Frequent writing, Intensive evaluation: one 250-word 

theme weekly, evaluated meticulously as In 3 above 
(pp, 40-2). 

Two evaluative Instruments were used, STEP Essay Tests and 
STEP Writing Tests , the former a writing test, the latter an 
objective test. Both were administered at the beginning and at 
the end. Three experienced (former) English teachers Independ- 
ently rated the STEP Essay Tests , the pretests In December and 
January, and the post-tests In May and June. 

Miss Arnold reached four conclusions: 

1. There Is no assurance that Intensive evaluation Is any 
more effective than moderate evaluation In Improving 
the quality of written composition. 

2. It must not be assumed that frequent practice Is In 
Itself a means of Improving writing. 

3. There Is no evidence that any one combination of fre- 
quency of writing and Intensity of evaluation Is more 
effective than another. 

4. There Is no Indication that frequent writing and Inten- 
sive evaluation are any more effective for one ability 
level than are Infrequent writing and moderate evalua- 
tion (p. 62). 

In this study there was no significant difference between the perform 
ances of men and women. 
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The University of Northern Iowa investigators wonder whether 
graders might have evaluated more alike had they conferred on an 
occasional paper (four correlations were in the 0.50's, the others 
being 0.62 and 0.76), and why, in a gains study, all themes were not 
scored at a single time with prethemes and post-themes mixed. A 
table showing comparisons of the terminal data only would also have 
been helpful. That is, how did the groups compare at the end? 



Buxton, Earl W. "An Experiment to Test the Effects of Writing 
Frequency and Guided Practice upon Student's Skill in 
Written Expression." Unpublished doctoral dissertation, 
Stanford University, 1958. University Microfilms 58-3596. 

[As reported in Braddock, et al. Research in Written Composi- 
tion . Champaign, Illinois! NCTE, 1963, pp. 58-70.] 

This experiment involved 257 students in the University of 
Alberta who were enrolled in a special "one-year 'emergency' 
course designed to train teachers for Alberta schools." All 257, 
who constituted the entire enrollment in the emergency program, 
carried the same courses (a "canned" schedule) . The total group 
was divided into six classes: two control classes, in which 

students did no extra, out-of-class writing; two writing classes, 
in which students wrote a 500-word paper each week as an extra 
out-of -class assignment for a total of sixteen weeks; two 
revision classes, in which students did the same amount of 
writing on the same assignments as the writing classes. Writing 
classes were not required to write on the assigned topic and 
received only a brief paragraph of teacher comment at the end of 
each theme; there was no marking of errors nor commenting in the 
margin, and students were not asked to do anything with the 
papers after getting them back. The revision classes were 
required to write on the assigned topic and papers were marked in 
terms of unity, organization, logic, correctness, and such 
matters, with a general comment at the end. Students in the 
revision classes were asked to correct and revise their papers in 
class on the day the papers were returned and discussed. The 
teacher was present to give aid. 

Criterion measures were two parts of an earlier edition of 
the Cooperative English Tests : "Mechanics of Expression" and 

"Effectiveness of Expression" (alternate forms before and after), 
and a theme. Each of two readers assigned a "content" score and 
an "error" score to each theme. The content score was based on 
fifteen factors with some factors weighted more than others. A 
maximum potential score was allotted for each factor. Each 
reader determined how much of that maximum to assign to that factor 
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in each paper. The error score was determined by counting errors 
in spelling, punctuation, or mechanics. The points assigned for each 
of the fifteen factors in a paper by each reader were added; then the 
count for errors was subtracted from that. The scores for the two 
readers were averaged, and that mean was arbitrarily divided by three 
to get a usable scaled score. 

The results of Buxton's study show that the revision students — 
those whose papers were carefully marked and who were required to 
revise them — made a significantly greater gain in writing achievement 
as measured by the themes during the seven months of the study than 
did the writing students— —those who wrote the papers but did not revise 
them. There was a more significant difference in gain scores between 
the revision students and the control students, who wrote none of the 
themes; this difference favored the revision students. Concomitant 
conclusions: theme ratings are reliable if the raters are thoroughly 

practiced in their system and frequently check on what they are doing, 
and (since there was no significant difference between the groups on 
the objective test scores) the theme ratings in this study measure 
something that the particular objective test used did not measure. 

It is not clear whether the division into groups took into 
account the balance of men and women. If, for example, the revision 
classes had more women than either of the other two groups, that could 
affect the results. 



lieys, Frank, Jr. "The Theme-a-Week Assumption: a Report of an 

Experiment," Englis h Journal. 51 (May 1962), 320-2?., 

This experiment dealt with varying the amount of writing and 
the amount of reading in high school English classes. Two classes 
in each of the four high school grades were "as closely matched as 
was possible under the normal sectioning practices of the school." 

The two classes in each grade were taught by the same teacher; one was 
designated as the writing class and the other as the reading class. 
Students in each writing class wrote a theme a week. After it was 
closely graded, the students corrected or rewrote it. Students in 
each reading class wrote a theme every three weeks, and spent one class 
day a week reading books of their own choice. Nothing is said concerning 
grading or rewriting of the reading- class papers. Evaluation instru- 
ments consisted of the STEP writing test and a theme, one of each 
administered at the beginning and at the end of the experiment. The 
themes were evaluated by three ETS readers using a nine-point scale. 

The students in reading classes achieved a slightly greater 
improvement in writing scores than did those in writing classes. 
Generalizations arrived at by the investigator: 
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1. Frequent writing practice probably yields greater 
dividends in grade 12 than in grades 9, 10, 11. 

2. Frequent writing practice probably yields greater 
dividends with low groups than with middle or high 
groups . 

3. Frequent writing practice with low groups probably 
yields greater dividends within the area of content 
and organization than within the area of mechanics 
or of diction and rhetoric. 

4. The claim that "the way to learn to write is to write" 
is not substantiated by this experiment. 

5. The claim that ability to write well is related to the 
amount of writing done is not substantiated by this 
experiment. 

6. For many students reading is a positive influence on 
writing ability. 

7. The influence of reading on the ability to write 
appears to be a separate factor, not directly related 
to the teacher's personality and enthusiasm (p. 322). 

It is not clear how the fourth generalization is supported 
by the experiment. Since all students in the experiment wrote 
themes, how can it be inferred that the data failed to support 
the notion that students learn to write by writing? Furthermore, 
Heys does not indicate whether the improvement mentioned was 
statistically significant. 




Kincaid, Gerald L. "Some Factors Affecting Variations in the Quality 
of Students' Writing." Unpublished doctoral dissertation 
(Michigan State University, 1953). University Microfilms No. 
5922. 



This experiment attempted "to determine whether a single 
paper written on a given topic at a particular time [italics 
Kincaid's] can be considered as a representative sample of his 
[the student's] writing ability — and thus provide a valid basis 
for evaluating ability at any time in a writing course." It is 
of interest, not because it deals with a directly related problem, 
but because it has implications for any study using theme readers 
to evaluate results. A group of eighty college students was 
divided into four subgroups, each of which wrote two papers in 
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one two-hour session on the same day and another two papers in a 
similar session a week later. Three topics were used: Groups A 

and C wrote on topics 1 and 2 each time (both argumentative); 
groups B and D wrote on topics 1 and 3 each time (one argumenta- 
tive, one expository) . Groups A and B wrote each time without 
examination pressure (papers not counted toward grade); groups C 
and D wrote without pressure once, and with it the other time 
(papers counted on term grade the first time and not counted on 
term grade the second time) . Papers were rated by three instruc- 
tors selected from the freshman staff, the rating being made on 
a ten-point scale (1 unsatisfactory, 10 superior) on each of five 
categories: grammatical conventions, sentence structure, diction, 

organization, and content. The score for a paper could lie 
between 10 and 50; it was determined by computing the mean of the 
two closest ratings; if the two extreme ratings were equidistant 
from the middle rating or if the two closest ratings were more 
than five points apart, the mean of all three was used. 

Kincaid drew the following conclusions from this study: 



1. ... the findings from this study cast considerable 
doubt upon the justification of the customary practice 
of using five letter-grades to designate [individual] 
achievement in a writing course when a single paper 
provides the basis for that designation (p. 97). 

2. If an evaluation of overall or average improvement is 
all that is desired, it can be obtained from a single 
sample of each student's writing for a pretest and a 
post-test. . . (p. 99). 

3. ... in order to develop a program for evaluating indi- 
vidual student improvement in writing (for strong as well 
as for weak students), it would be advisable to obtain 
several samples of writing by each student — samples of 
writing on different topics on the same day and on the 
same topics on different days. And such samples should 
be obtained for both the pretest and the post-test 



Two matters impress the present investigators: 1) The theme 

topics used by Kincaid were simpler than those used in the University 
of Northern Iowa investigation. If more difficult topics had been 
used by Kincaid the results might have been different. 2) The 
findings of the Kincaid investigation support the use of group 
average scores on a single pretheme and a single post-theme. 



(p. 99). 
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Kreisman, Arthur, e£ al. Pilot Study in English. Mimeographed 

report and dittoed summary of statistics. Ashland, Oregon: 
Southern Oregon College, 1963 (no pagination). 

This is the report of a pilot study designed "to investigate 
techniques and writing skills as a possible means of establishing 
the basis for a more extensive research program." It is interesting 
because the results led the Oregon investigators to abandon further 
experimentation, and because one of those investigators suggested a 
study like the University of Northern Iowa study. In the Oregon study, 
both college freshmen and high school students were involved. 

Control and experimental groups were matched at both levels: the 89 

college students on the Verbal and Quantitative scores on SAT , the 
total score on SCAT , and the sum of two ratings on the STEP Essay 
Test ; the 108 high school students on the score on the California 
Test of Mental Maturity and the sum of two ratings on the STEP Essay 
Test . Both control and experimental students were in each class. 

The control students wrote a theme a week (a total of 9 for the college 
group, 36 for the high school group); the experimental students wrote 
a theme a month (a total of 3 for the college group, 10 for the high 
school group) . Evaluation was based upon comparison of the STEP Essay 
ratings at the beginning and at the end of the experiment. 

There was no significant difference between the college 
experimental and control groups. The results for the high school 
groups varied. There was a significant improvement for the below- 
average high school students in the control group (more writing) ; 
there was a slight (non-significant) drop in achievement for the 
above-average students in the control group (more writing) . There was 
no significant difference in the experimental group (less writing). 

Dr. Cloer, the statistician, wrote: "It would appear that the 

principal beneficiaries of the experience in writing were those subjects 
of below-average ability or those who might be called * under-achiever s, * 

If 



Comments quoted from Kreisman: 

1. No adequate instrument for testing [composition] seems 
available. 

2. The difficulty of obtaining a sufficient number of 
students to make the experiment valid was one of the 
major obstacles. 

3. ... a purely quantitative experiment has little 
chance of being valid. 
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. . . one term of writing practice is not sufficient to 
form a foundation for judgment regarding the develop- 
ment of writing ability. 

5. ... frequency may indeed be a factor in the development 
of writing ability. 

6. ... all experiments of this nature are of no value and 
invalid on an a priori basis. 

In the light of the University of Northern Iowa study, the 
following additional comments are of special interest, the first by 
Kreisman, the second by Cloer, the statistician: "The emphasis 

that we thought might be fruitful [for future research] would be 
one which dealt with student-teacher relationships or with maturation 
of students regardless of the courses they took," and "Perhaps a 
better 'experimental group' would be one that did no writing (in 
English classes) over the experimental period." 



McColly, William and Robert Remstad. Comparative Effectiveness of 
Composition Skills Learning Activities in the Secondary 
School (Cooperative Research Project 1528). Madison: Uni- 

versity of Wisconsin, 1963. 

This study attempts to answer three questions: 

Does more writing alone result in better writing? 

Do more of "functional non-writing composition 
learning activities" (practical instruction: working 

with student-written papers, emphasizing spelling, 
proof-reading, revision, etc.; group discussion; 
teacher evaluation and comment) result in better 
writing? 

Does tutoring with immediate feedback (having the 
teacher present while the writing is being done and 
advising the student during the process) result in 
better writing? (p. 18) 

To answer the first question, dealing with the effect of the 
quantity of writing on improvement in writing, the investigators 
used two classes in the eighth grade and two classes in the ninth 
grade. To answer the questions relating to "functional non- 
writing activities" and immediate feedback (tutoring), three 
classes in each of the tenth, eleventh, and twelfth grades were 
used. Covariance techniques and, to the extent possible, random 
selection of samples were employed. 
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To explore the effect of the amount of writing on improve- 
ment in writing, control classes in the eighth and ninth grades 
wrote a theme a month; experimental classes wrote a theme a week. 

All other class activities and assignments were the same. During 
the year, the eighth-grade control classes wrote 9 themes and the 
eighth-grade experimentals wrote 35 themes. The ninth-grade 
- P° ntr °l classes wrote 8 themes, the experimentals . 34. 

To study the effect of non-writing activities and tutoring, 
one control class (a monthly theme with functional instruction), 
and two experimental classes (weekly theme and functional 
instruction), were organized at each grade level. About 9 writing 
tasks with functional activities were completed in the control 
classes, about 34 in the experimental classes. There were no 
individual conferences or "tutoring" activities in the first of 
these experimental classes in each grade. There were about 27 
regular "tutoring" sessions in the second experimental class in 
each grade. Thus, a ratio of 4-1 was maintained in writing tasks 
with functional activities between the experimental and control 
classes. 

Criterion and covariate measures for all students in the 
experiment included: SCAT (IA, IIA, IIIA) , Nelson-Denny Reading, 

Correctness and Appropriateness of Expression" and 
'Ability to Interpret Literature"), previous English GPA, overall 
GPA, and writing samples, two written before the experiment and 
two written at the end. 

Based on this experiment, the answer to the first question 
is no. Results indicated that increase in the amount of writing 
by itself has no significant effect upon the writing proficiency 
of high school students. Again, based on this experiment, the 
answer to the second question is affirmative; the answer to the 
third question is negative. Experimental classes with weekly 
theme and functional instruction improved significantly compared 
to the control classes. The experimental classes with tutoring scored, 
at the end of the experiment, about half way between the control 
classes and experimental classes without tutoring. 



Rohman, D. Gordon and Albert Wlecke. Pre-writing: The 

Construction and Application of Models for Concept 
Formation in Writing (Cooperative Research Project No. 

2174), East Lansing, Michigan: Michigan State Uni- 

versity, 1964. 

This is one of the very few studies that have resulted in a 
sadistically significant difference between control and experimental 
groups. Six sections of a college sophomore course in expository 
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writing with an emphasis on pre-writing activities constituted the 
experimental groups. Six sections of a college sophomore course in 
expository writing with an emphasis on pre-writing activities 
constituted the experimental group. Three sections were taught 
each quarter for two quarters. The rest of the students enrolled 
in the same course (11 sections in the Winter term, 10 in ths 
Spring term), constituted the control group. The total number of 
students involved in the experiment is not disclosed. The experi- 
mental course contained six units: 1. The role of the writer. 

2. The escape from category (the concrete rather than the abstract). 

3. The escape from cliche (avoiding someone else's way or words). 

4. Dynamic relationship to the subject (an urgency to express 

what the writer has "discovered"). 5. Concrete analogy (expressing 
one's "discovery" by comparison with something like it). 6. Refine- 
ment (finishing the essay). Three major techniques were used: 
keeping a journal, meditation, and use of analogy. The control 
sections were taught as each teacher wished to teach them, with the 
exception that all instructors of the control sections assigned two 
500-word themes on topics used in the experimental sections. These 
themes were used in the evaluation. 

Evaluation of the experiment involved four devices: 1. 

statements written by students in answer to the question: What 

did you like or dislike about the course?, 2. statements by the 
teachers who taught the course, 3. "objective" evaluation by readers 
who did not teach the course, and 4. "subjective" evaluation by 
teachers who did not teach the course. No objective testing was 
reported . 

Evaluation by students was strongly favorable. Major items 
were that the course was enjoyed, that it developed freedom in 
writing and in the discipline of writing and thinking, that criticism 
of student writing led to involvement in the process of writing, that 
attitudes toward writing had changed (regarding, for instance, the 
relationship between thinking and writing), that the use of analogy 
led to greater concreteness and clarity. Negative criticisms, which 
were relatively few, included the following: the course was too short; 

it was too piecemeal; not enough grades were given; class criticism 
was too negative; the journal was an invasion of privacy; the use of 
analogy was mechanical. 

Instructors gave a number of reactions to the experiment, but 
their enthusiasm tended to center on three matters: the journal 

as a device to stimulate students to meditate about their experiences 
as well as to formulate their meditations in writing, the emphasis on 
the pre-writing process, and the freshness and soundness of the 
writing done. 
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The essays for "objective" evaluation were selected from the 
total submitted by control and experimental subgroups on the two 
topics used by both subgroups. There were 226 experimental and 
^09 control essays evaluated. No information is given concerning 
how these essays were selected. Essays were judged on a four- 
point scale: 4. superior, 3. above average, 2. below average, 1. 

incompetent. Three standards, unity, coherence, and emphasis, 
were guides for the readers. There were eleven readers, four high 
school teachers and seven college teachers. They worked in teams of 
eight, three who read at the first session not reading at the 
second, and three others substituting for them at the second. Each 
theme was read twice. About 85 percent of the grades assigned were 
either the same for each theme or only one point different, indicating 
that the grading was relatively reliable. The results showed a statist!' 
cally significant difference between the e xperimen tal and control 
groups in favor of the experimentals . 



Four members of the English staff not involved in the experiment 
read the papers "subjectively." They were given a randomly selected 
sample of 50 experimental and 50 control themes. Rohman and Wlecke 
informed these readers concerning which set was experimental and which 
was control. Some investigators would not have done that. The readers 



were asked to answer a series of three questions: "Which set of essays 

seems to have more originality and in what ways? Generally, in which 
set of essays does it seem more important for the writers to express 
themselves and not be misunderstood? Which set of essays gives the 
greater sense of form?" (pp. 130-1) In addition, the readers were asked 
a series of specific questions concerning only the experimental essays, 
such as: "Do the techniques employed in the experimental essays — the 

jneditation in the 'Loneliness' essays, and the analogy in the 'Coming of 
Age' essays — seem to provide a more coherent means for the instructor to 
gauge the success or failure of an essay?" All four readers gave the 
experimental group of essays the higher rating. 



Rohman and Wlecke leave so many questions unanswered that the 
report is difficult to interpret. How many students were in each 
sample? Were the students of the experimental sections similar in 
ability to those in the control sections? Did either sample have 
appreciably more women than the other? How were the themes that were 
evaluated selected? Do the 226 experimental themes represent a sampling 
comparable to the 409 control? Would a sampling of the control students 
have written as enthusiastically of their course as the experimentals 
did? To what degree did the Hawthorne effect operate? What implica- 
tions has this study for composition programs generally? 
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Sutton, Joseph T. and Eliot Allen. The Effect of Practice and 

Evaluation on Improvement In Written Composition . 

(Cooperative Research Project No. 1993). Deland, Florida: 

Stetson University, 1964. 

This study randomly divided college freshmen Into five 
groups. The first two of these (Groups I and II) served as con- 
trols . During the period of the experiment, these two groups 
received no Instruction In composition and wrote no papers except 
the six criterion themes which provided the "before" performance 
and the six criterion themes which provided the "after" perform- 
ance. Group I wrote all twelve themes within a four-week period 
at the beginning of the semester. Group II wrote the first six 
criterion themes the first two weeks of the semester and the 
second six criterion themes the last two weeks of the semester. 

Groups III through V were the experimental groups, and all wrote 
six criterion themes the first two weeks and another six the 
last two weeks (as did Group II) . In the ten-week Interval 
between the writing of criterion themes, Group III wrote no 
papers but did evaluate four peer papers each week; Group IV 
wrote one theme each week which was evaluated by the members of 
Group III; and Group V wrote one class theme each week which was 
evaluated by a "professor." 

Five readers read each theme twice, once to rate it, once 
to rank it in an order of excellence relative to the other eleven 
themes by each writer. Rankings were based on five criteria: Ideas, 

mechanics, wording, form, and flavor, each one of which was scored 
on a five-point scale. A total for the six "before" themes for 
each student as graded by all five graders, divided by thirty (6 
themes x 5 graders) gave an average score for each writer. The same 
was done for the six "after" themes, and the averages were compared. 

Particularly In relation to the University of Northern Iowa 
study, Sutton and Allen's enterprise Is Interesting. First, none of 
the students In any of the groups received direct Instruction In 
composition. Such Instruction as Groups IV and V received came 
from the marks and comments on their papers. Group III gained 
experience In editing, though uninstructed In the procedure. Groups 
I and II had no experience whatsoever with composition except the 
twelve criterion themes. Thus, to a degree this study Is similar to 
the present one In that no direct Instruction In freshman composition 
was given and that some of the groups wrote only the criterion themes. 
It Is different from the present study In that there was not a direct 
comparison between those completing a freshman program of writing 
Instruction and others not In the freshman English course at all. 






The results In the Sutton and Allen study showed an unusual 
Inconsistency between the themes and the objective tests. In 
theme performance, the members of the five groups showed a sig- 
nificant decline during the experimental period. A decline was 
observed for the five groups combined and for each group 
separately. This decline was, of course, unexpected. The authors, 
In speculating about Its source, state: "Unfortunately, It appears 

that the very procedure necessary to secure such stability [among 
the theme performances] Introduced other factors that may have had 
a deleterious Influence on the results." The frequency of writing 
of test themes which were neither returned to the student nor 
commented on seems, In the opinion of Sutton and Allen, to have 
created an attitude of boredom and Impatience among the students. 

On each of the two objective tests, the Cooperative English Tests : 
English Expression and the College Entrance Examination Board 
English Test , the students showed significant improvement. This 
was true for the five groups combined, and there was no significant 
variation among the five groups in this respect. 



Wolf, Melvin H. Effect of Writing Frequency upon Proficiency in 
a College Freshman English Course . (Cooperative Research 
Project 2846), Amherst, Massachusetts: University of 

Massachusetts, 1966. 

This study involved six "regular" sections of college fresh- 
man composition and four remedial sections. Two of the regular 
sections, designated experimental-high frequency , wrote 39 themes 
in the school year; two sections, designated experimental-low 
frequency wrote 8 themes in the year; two sections, designated 
control , wrote 15 themes in the year, the usual number in fresh- 
man composition at the University of Massachusetts. Two remedial 
sections, designated experimental-high frequency , wrote 20 themes 
in one semester; the other two, designated control , wrote 8 
themes in one semester. These themes were carefully evaluated by 
the instructors and were revised and resubmitted by the students. 

The objective test used was Cooperative English Tests , Form 1C . 

Six themes were used as tests: two written at the start, two at 

the end of the first semester, and two at the end of the second 
semester. The remedial students, being in the study only one 
semester, wrote only the first four test themes. Evaluation of 
the test themes was done by ten instructors under the direction 
of an experienced instructor who had been a reader for the Educa- 
tional Testing Service. Wolf drew two conclusions: 1) writing 

proficiency did not improve with the increase in frequency of 
writing, 2) there was a high correlation between the scores on 
objective tests of grammar and mechanics and scores of themes as 
determined by the reading team. Since COOP has a section on 
mechanics and a section on effectiveness but usually yields a single 
score, it is not clear how the second conclusion was arrived at. 
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PROCEDURES 



The overall design of the project Involved selecting experimental 
and control subgroups at each university and testing them on four 
different occasions: the beginning of the freshman year (September 

1964), the end of the first semester (January 1965), the end of the 
first year (May 1965), and the end of the second year (May 1966). 

Members of the experimental subgroup received no instruction in 
freshman composition; members of the control subgroup did receive 
instruction in freshman composition. The performance of these sub- 
groups was compared at each testing period to determine whether the 
observed differences in their performance on the criterion measures 
were statistically significant. Care was taken that the members of 
each subgroup at each university would be representative of the total 
freshman class entering that university in September 1964. Members 
of both experimental and control subgroups pursued a normal academic 
program except that the experimentals omitted the freshman composition 
course. The experimental subgroups took other courses instead of 
freshman composition, usually other general education courses, or courses 
in the major or minor. 



Establishing Matched Pairs 

Procedures for establishing matched pairs of students were 
developed to supply a number of pairs at the start sufficient to assure 
that after the attrition of two years enough pairs would remain to 
enable the investigators to draw sound conclusions. These procedures 
were predicated upon an incoming freshman class of 1,100 students, 
the approximate size of the freshman class at the smallest of the 
five universities in September 1964, and large enough to guarantee at 
least 300 matched pairs at the start. One-third of the 1,100 at each 
institution were designated experimental students and not permitted 
to enroll in freshman composition (experimental pool) ; the remaining 
two-thirds were designated control students and required to enroll in 
freshman composition (control pool). The selection of the students 
from the experimental pool necessarily antedated their actual enrollment 
in September 1964 in order to assure that they would not be enrolled in 
freshman composition. 

It was necessary, in those institutions which would enroll more 
than 1,100 freshmen, to devise a procedure which would reduce the 
potential participants to that number before the selection of the 
experimental and control pools was made. When approximately 85 percent 
of the expected, new, beginning freshman students had been cleared for 
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admission at each such Institution, 1,100 students were selected 
from the total group by a random process. 

When the group of 1,100 was established at each Institution, 
the next step was to select a subgroup which would Include approxi- 
mately one-third of the 1,100, would contain a ratio between men and 
women representative of the total group at each university and would 
reflect the range of performance of that group on the English section 
of the ACT (at four Institutions), or the verbal section of the SAT 
(at one Institution). First, the students were divided by sex, and 
then within each sex, ranked from high to low In terms of standard 
scores on the English section of the ACT or the verbal score of the 
SAT. By use of a table of random numbers, the Investigators selected 
33 percent of the students of each sex at each score level. The 
students thus Identified at each university became the experimental 
pool; the remainder of each 1,100 became the control pool. 

Matching for all schools was performed at the Data Processing 
Center at the University of Northern Iowa after the September 1964 
testing. After the themes had been scored, matched pairs were formed 
for each participating Institution. Criteria for matching were age, 
sex, theme score, and a score representing combined performance on 
the CEEB and the COOP. Students were matched exactly on sex and 
theme score, within one year on age, and within three points on the 
combination of CEEB and COOP (Z-score). 



The matching may be Illustrated from actual data from three 
pairs of students. The numerals represent, In order, the student's 
sex (1 for male, 2 for female), total theme score (sum of two ratings), 
year of birth, and combined objective test score. 



Subgroup 


Sex 


Total Theme Score 


Year of Birth 


Z-Score 


Experimental 


2 


10 


1945 


111 


Control 


2 


10 


1946 


111 


Experimental 


1 


6 


1945 


85 


Control 


1 


6 


1946 


85 


Experimental 


1 


11 


1946 


111 


Control 


1 


11 


1946 


113 



i 

The combining of the scores of the two objective tests was 
accomplished by using the CEEB Standard Rating and the COOP Converted 
Score, transforming each Into a new standard score on a scale having 




a mean of 50 and a standard deviation of 10, and adding the two resulting 
transformed scores. The computer was instructed to examine the scores 
of an experimental student and to search the control pool for the best 
possible match. As indicated in the discussion above, the ratio between 
the experimental pool and the control pool was approximately one to two. 



Evaluative Instruments 



Three tests of performance in composition were used: the 

Cooperative English Tests: English Expression (COOP), the College 

Entrance Examination Board English Composition Test (CEEB) , and a 
theme . 



Objective tests . COOP and CEEB are objective tests. The COOP 
appealed to the investigators because it had been employed in previous 
researc at the University of Northern Iowa and seemed to serve as a 
reasonably satisfactory indirect measure of student writing ability. 

The CEEB, unlike the COOP, is a "secure" test. It is changed from 
administration to administration and a serious attempt is made to 
assure that students will have no prior access to any of the test 
items. It was included in part because of its greater security and 
in part because of a high correlation which had on one occasion been 
secured between performance on it and evaluations of writing samples. 

Following is a list of the specific test forms employed on the successive 
testing occasions: 



Testing Date 


COOP 


CEEB 


September 1964 


Forms A & B 


GB03 


January 1965 


alternating 


HB01 


May 1965 


at each 


HB02 


May 1966 


university 


JB02 



The COOP contains 90 items- -30 on Effectiveness and 60 on Mechanics. 
Total time limit is 40 minutes. The CEEB contains from 100 to 110 
items and has a total working time of 60 minutes~20 minutes recommended 
for each of three sections. From test form to test form the elements 
tested by the CEEB vary somewhat. Representative elements include 
paragraph organization, construction shifts, sentence correctness, and 

usage. The various forms of the test are regarded as equivalent but 
not parallel. 

Theme. The theme was a paper written within a two-hour period 
on a single topic provided by the investigators. Students were urged 
to remain for the full two-hour period, though they were allowed to 
leave after an hour and twenty minutes. An explanation of the method 
for selecting topics, a theme instruction sheet, and the topics used 
on the various testing dates are included as Appendix A. 
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Themes were evaluated by teams selected by Fred Godshalk, 
Chairman of Test Development in the Humanities at the Educational 
Testing Service, from the pool of readers used by the Educational 
Testing Service in its theme-reading program. These teams were 
used because of their wide experience with theme reading and 
because many of the same readers would be used on successive 
scoring occasions. 

The ETS readers were accustomed to a 4-point scale. The 
University of Northern Iowa investigators preferred a 9-point 
scale. The goal was to employ a scoring scale which would permit 
the separation of the themes into a reasonable number of quality 
levels without presenting the evaluators with so many rating 
categories that undue time would be consumed in pondering fine 
distinctions. A compromise was adopted: a 9-point scale (1 to 9) 

with emphasis on 2, 4, 6, and 8. 

When Mr. Godshalk communicated his standards to the readers, 
they were asked to think of the normal curve as split in the 
middle, with each segment so created split again halfway between 
the median and the extreme. This created four categories: much 

below average, below average; above average, much above average. 

It did not provide specifically for the average rank. Readers, 
already accustomed to the 4-point scale, found it easy to use 2, 

4, 6, and 8 as their main grades, but they were able also to use 
the odd numbers whenever it seemed that a particular paper had 
some characteristic requiring a grade between two of the even 
numbers. Since each paper was read by two readers and the ratings 
summed, the total possible range of scores for a single paper was 
from 2 to 18. An explanation of the reading procedure is given in 
Appendix C. 

It is recognized that the validity of these evaluations 
depends upon the degree to which Mr. Godshalk* s judgment of 
student writing, as modified by discussion with the readers, is 
sound. Mr. Godshalk has an unusually wide background in 
evaluating the writing of college-bound high school seniors. 

The readers were from a variety of geographical backgrounds and 
a wide range of educational institutions. Mr. Godshalk has for 
years supervised groups of readers like these; the readers have 
worked together as teams in just such reading situations. Though 
neither Mr. Cowley nor Mr. Jewell consistently compared their 
evaluation of sample themes with that of the groups, when they 
did, there was no pronounced disparity between their ratings and 
those of the readers. In the judgment of the investigators, the 
validity of theme evaluations is as high as it is possible to 
achieve in a project of this sort. 



^Godshalk, Fred, Frances Swineford, and William E. Coffman. 
The Measurement of Writing Ability , New York: College Entrance 

Examination Board, 1966. 
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A different topic was used on each testing occasion, the themes 
were evaluated at different times, and reader personnel shifted from 
reading to reading. For these reasons, to report differences between 
testing dates as gains would be misleading, and this has not been done. 
Rather, differences between the experimental subgroup and control sub- 
group after the exact matching in September 1964 are presumed to result 
from the absence or presence of instruction. 



Reliability 

Cooperative English Tests : English Expression . This instru- 
ment, published in 1960, is composed of two parts: "Part I: 

Effectiveness," thirty items; and "Part II: Mechanics," sixty items. 

The time limits are 15 minutes and 25 minutes respectively. A student's 
score is the total number of correct responses. This raw score is 
transformed into a Converted Score by means of a table provided by the 
publishers of the test. For Form 1A, the possible range in converted 
scores is from 115 (raw score of 0) to 191 (raw score of 90) . For the 
two forms of the test (1A, IB) recommended for use with college fresh- 
men and sophomores, the investigators were able to find reliability 
evidence only for the twelfth grade level. The correlation between 
parallel forms was 0.84 and the standard error of measurement was on the 
order of 4.00 converted score units. 



The College Entrance Examination Board English Composition Test . 
This is one of the CEEB achievement tests. Evidence about the 
functioning of this instrument seems to be directly concerned with 
validity. This is reflected in one of the earlier reports on the 
instrument, which appeared with the title "Composition Test Shows High 
Validity on Reliable Criterion of Writing Ability. The excellent 
84-page report called The Measurement of Writing Ability also dealt 
primarily with the validity of the College Entrance Examination Board 
English Composition Test (CEEB) . It is realized that to achieve 
validity a test author must at the same time achieve reliability. A 
third source of information was The Sixth Mental Measurements Yearbook . 
Holland Roberts, one of the three reviewers of the test, commented on 
reliability: "For the composition test a Kuder-Richardson formula 20 

reliability of 0.85 and a standard error of measurement of 39 is 
reported, indicating satisfactory discrimination among the members of 
the test group. 



3 

"Composition Test Shows High Validity on Reliable Criterion 
of Writing Ability," ETS Developments , XI (January 1963) 1 & 4. 

^Godshalk, og_. cit . 

^Roberts, Holland [a review of the CEEB English Composition 
Test], Sixth Mental Measurements Yearbook . Ed. Oscar K. Buros. 
Highland Park, New Jersey: Gryphon Press, 1965, p. 590. 
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Theme. The theme test consisted of an impromptu paper, 300-500 
words in length, written within a two-hour period. A new topic was 
used at each testing session, and at each session only one topic was 
provided. Typically, the topic consisted of a quotation set in a 
framework intended to link the topic and the student s experience 
(see Appendix A). Experimental and control students wrote at the 
same time and in the same place. 

Each theme was evaluated by two readers working independently 
(see discussion, page 41). Each reader assigned each paper a 
numerical value on a scale extending from 1 to 9. It is thus 
possible to examine the extent of between-reader agreement in 
assigned ratings. 

As stated in the Interim Report , page 55, the investigators 
believe that a meaningful basis for thinking about theme reliability 
is in terms of the extent of agreement between the two independent 
ratings of each theme. For the present discussion, the theme scores 
of 90 matched pairs of students were examined. The 90 matched pairs 
were all of the pairs available at one university in May 1965; they 
are Included among the 365 matched pairs, the total for four insti- 
tutions,* whose theme performance of May 1965 is reported in Table 
XXVI. The tabulation below displays the inter-reader consistency 
in theme ratings for the 180 students— 90 experimentals plus 90 
controls. 

Difference in Two 

Ratings of Theme Number of Themes 

46 
62 
52 
9 
9 
2 



0 

1 

2 

3 

4 

5 



Mean Difference - 1.33 

This tabulation gives only the absolute value of the differences. 
An estimate of a reliability coefficient derived from a distribution 
of differences in the two assigned ratings would need to take into 
account the direction as well as the amount of the differences. 



* After the first semester the data at institution 4 were limited 
and incomplete and therefore not included in the totals. 
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Approximately one-fourth of the ISO themes were assigned the 
same rating by the two independent readers. More than half of the 
180 themes (108) were rated r»o more than 1 point apart. Only 20 of 
the 180 papers showed an inter -reader discrepancy of more than 2 
points. The maximum inter-reader discrepancy was 5 (for 2 of 180 
themes). The maximum possible inter— reader discrepancy was 8. 

The degree of Inter-reader consistency portrayed in the above 
tabulation can also be represented by a coefficient of correlation. 
The tabulation below indicates that the Pearson product-moment r 
between the two sets of ratings for the 90 experimental themes was 
0.22, for the 90 control themes 0.18 . Thus if the theme were 
regarded as a 9-point test, the coefficient of correlation of rating 
consistency would be on the order of 0.20. 









Reader 1 


Reader 2 


Readet 


1 + 2 


Subgroup 


N 


r_ 


Mean 


S.D. 


Mean S.D. 


Mean 


S.D. 


Experimental 


90 


0.22 


4 c 98 


1 * 30 


4.83 lc34 


9.81 


2.05 


Control 


90 


0.18 


4 o 92 


1.34 


5.03 1.31 


9.96 


2.04 


However , 


it is 


more 


appropriate to 


regard the 


theme as 


an 18- 


point test, for the 


score 


used 


for each 


student was 


the sum 


of the 



two ratings, with a potential range of 2 to 18. The coefficient of 
0.20 could then be conceived as the correlation between scores on 
two readings of a half-test. Actually, however, it is the range 
of the rating scale rather than the length of the test which is 
being doubled. In such a context it. is possible to estimate the inter- 
reader correlation on an 18-point scale by basing the correlation on 
two ratings of the same test. The Spearman-Brown Prophecy Formula 
would yield a coefficient of 0.33 in this event. 



The foregoing discussion of theme rating reliability in terms 
of coefficients of correlation suggests the complexity of the 
assumptions, interpretations s and arbitrary decisions involved. For 
most purposes the extent of agreement between two independent ratings 
assigned to a single paper, as illustrated above for 180 papers, 
provides the clearest picture of theme-rating reliability. 

Inter— reader agreement represents only one aspect of theme 
reliability. Involved also is the fact that a student* s performance 
probably differs from day to day and from topic to topic. Because 
of problems like these, though there is considerable acceptance of a 
theme as a desirable form of measuring instrument in English composition, 
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there is considerable doubt that ratings can be assigned reliably. j 

Thus, degree of inter-reader consistency in assigned ratings is 
of major interest. While the grading of any essay test is diffi- 
cult, the grading of themes is especially complex, particularly 

because there is a minimum of commonality among satisfactory j 

responses to a topic. 



FINDINGS AND ANALYSIS 



General Data 



The data are treated first to show the nature of the per- 
sisting samples and then to clarify the results at each testing 
period on each criterion measure. Testing was done on four occasions 
beginning of the first semester, end of the first semester, end of 
the second semester, end of the fourth semester. The numbers of 
matched pairs were, respectively, 1,040 (original group), 597, 365, 
and 122. 



The Samples 

Table I presents a composite picture of the entering freshmen 
at the participating universities, reporting their performance in 
September 1964, on seven variables. The number of freshmen per 
institution varied from 705 to 943, selected in the manner described 
on page 38. The 4,190 freshmen, representative of the freshmen 
entering the five participating universities in September 1964, 
constitute the sample from which all subgroups were drawn. The data 
in Table I provide evidence of the extent to which the persisting 
experimental and control subgroups, composed of matched pairs of 

students, remain representative of the parent group. None of the j 

information in Table I involves student performance after September 

1964. j 

| 

Line one shows the performance of the 4,190 students — the j 

experimental pool plus the control pool — in September 1964. For j 

example, their mean percentile rank in high school class was 67.02.* 



*The percentile rank data were reported by four of the i 

participating institutions as the percentage of individuals with a 
high school rank lower than that of the given individual. One jj 

institution reported the tenth in which each individual ranked; in 
this instance the investigators used 95, 85, 75, ... . We have 
computed mean percentile rank, realizing the limitations of such a 
procedure. I 
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Line two indicates that the 1,408 members of the experimental pool 
had a mean percentile rank in high school class of 66.37, while 
line three indicates that the 2,782 members c f the control pool had 
a mean percentile rank in high school class of 67.35. On the 
remaining six variables, means for the experimental pool and the 
control pool also show a close similarity. 

Establishment of matched pairs resulted in 1,040 members of 
the experimental pool being paired with members of the control 
pool. The 1,040 matched experimentals made up the experimental 
subgroup; the 1,040 matched controls made up the control subgroup. 

The fact that matching was exact for Total theme Rating may be seen 
in the means of 9.00 and standard deviations of 2.22. The subgroup 
means were also similar on the other variables. For example, mean 
percentile rank in high school graduating class was 66.33 for the 
experimentals and 67.92 for the controls. It is worthy of note that 
the means for the matched experimentals and the matched controls 
were close to the means for the respective pools. That is, the 
process of forming matched pairs yielded experimental and control 
subgroups representative of the parent group — the 4,190 entering fresh- 
men who constituted the project pool. 

Attrition reduced the number of matched pairs from 1,040 in 
September to 597 in January. In the matched pairs design a complete 
matched pair must be dropped if only one member of the pair leaves.* 

The degree to which the two subgroups have been "refined" by the loss 
of members over the first semester may be examined. Comparing line 
four with line six reveals that the 1,040 members of the experimental 
subgroup beginning the semester had a mean percentile rank in high 
school class of 66.33, while the 597 members of the subgroup completing 
the first semester had a mean percentile rank in high school class of 
69.31. The corresponding facts for the control subgroup are 67.92 
and 70.44. Analyses for the other variables show that there was a 
similar selectivity factor operating which caused the January sub- 
groups to be slightly superior to the larger parent subgroups. 

Data for the full freshman year also show the selectivity 
associated with attrition. At the end of the academic year, the 
percentile rank for the experimental subgroup (N*365) was 70.82, 
while that of the control subgroup was 71.48 — changes of 4.49 for 
the experimental subgroup and 3.56 for the control subgroup. 



^Though the matched pairs design is vulnerable to high 
attrition, it has advantages which counterbalance this weakness. 
See discussion. Appendix B. 




Data for the complete two-year period likewise show the 
Influence of attrition and absences from test sessions. In May 
1966 the persisting members of the experimental subgroup (N*122) 
show a mean percentile rank In high school class of 73.93, and 
the control 71.64. 

Another way of examining the extent to which persisting sub- 
groups of matched pairs excelled the original subgroups of matched 
pairs Is In terms of the placement of mean scores In the September 
distribution of student scores. For this purpose the CEEB test and 
the distribution of scores for the 4,159 students who, at the outset, 
comprised the experimental pool plus the control pool will be used. 
The tabulation below shows that, for the 1,040 matched experimental 
students, the mean CEEB English Composition Standard Rating was 
472.04 and that for the control students the mean was 472.13. Each 
of these means lies at approximately the 49th percentile rank In the 
distribution of the 4,159 scores. 

Mean CEEB Percentile Rank Based on 

Standard Rating September 1964 Distribution 

September 1964 Subgroup (N«4.159) 



472.04 


September 1964 
Experimental (N»l,040) 


49 


472.13 


September 1964 
Controls (N»l,040) 


49 


484.19 


May 1965 

Experimental (N»365) 


54 


482.81 


May 1965 
Controls (N*365) 


53 


496.88 


May 1966 

Experimental (N-122) 


58 


493.32 


May 1966 
Controls (N*122) 


57 



This analysis has shown that whereas among the 1,040 original 
matched pairs the typical CEEB score had a percentile rank of 49 In 
September (close to the expected 50), for the 122 matched pairs who 
completed testing through two years of college the typldal CEEB score 
had a percentile rank of about 58 In September. Thus the degree of 
selectivity over the two years was on the order of 8 to 10 percentile 
rank points on the CEEB. 



49 







taUiii 



At the end of the freshman year (May 1965) * 365 matched 
pa rs remained. The experimental subgroup had a September CEEB 
mean of 484.19, which corresponds to a percentile rank of 54 in 
the distribution of the 4,159 scores. The corresponding figures 
for the control subgroup were 482 . 81 , and 53. 

The tabulation above also shows the percentile rank in the 
September 1964 student score distribution for the mean scores of the 
members of experimental and control subgroups who completed the May 
1966 testing. For the 122 experimentals the September CEEB mean 

W ^ 3 /^ 6 ^? 8, 3nd the P ercentlle rank 58. The 122 controls had a mean 
of 493.32, and a percentile rank of 57, 



All Tests — September 1964 through May 1966 



Whereas in Table I all test scores were those available in 
September 1964, both Table II and Table III present performance 
at successive testing occasions. Table II presents the performance 
at each of the four successive testing periods beginning with 
September 1964, of the 122 matched pairs who completed the entire 
testing program; Table III portrays the performance of all persisting 
matched pairs at each of these four successive testing periods. 
Inspection of this table will reveal the differences in performance 
on each of the criterion measures for the two subgroups at the 
beginning of the fall semester, 1964-65; at the end of the fall 
semester, 1964-65; at the end of the spring semester, 1964-65; and 
at the end of the spring semester, 1965-66. 



The experimental students did not receive instruction in 
freshman composition; the control students did. The data in Table 
III permit the key comparisons of the project; those between the 
performance of the experimental and control subgroups on the 
criterion measures at successive points in their college careers. 



Intercorrelation Data 



Table IV shows product -moment coefficients of correlation 
for all possible pairs of variables among a set of eight variables. 
The 597 students are the experimental members of matched pairs who 
completed the first semester of the freshman year at the partici- 
pating institutions. 



The following specific points may be noted: 

How did percentile rank in high school class correlate 
with all the measures of English composition ability? Between 
0.20 and 0.30. 
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NOTE: The small, but perturbing, problems with the rounding of decimal values were encountered with portions 
of this table. The investigators recognize that in practically every table in this report there are spots at 
which the "rounding dilemma" may have produced slight inconsistencies in the reported values. 
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552.11 75.96 4.74 1.49 4.65 1.46 9.39 2.59 
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Theme Rating 

January 1965 0.20 0.46 0.46 0.40 0.44 0.42 0.47 

^Combination of Cooperative English Test: English Expression and College Entrance Examination Board English 

Composition Tfest, September scores* See page 39. 






me 



How did the September scores on the two objective tests 

the Januar y theme scores? 0.46 for COOP; 0.44 for 
CEEB; 0.46 for COOP + CEEB (Z-score) . 



How did the January scores for the two objective tests 
correlate? 0.56 for COOP vs. CEEB. 



How did the September scores and January scores correlate? 

COOP IZ 0,65 f0r CEEB: 0,47 for Theme • How did the September 

COOP and CEEB scores correlate? 0.64. 



fc . o Did , the Januar y scores on the CEEB correlate higher with 
the September CEEB scores than with the September COOP scores? 
Possibly slightly; 0.65 with CEEB, 0.63 with COOP. 



The above intercorrelation data for the experimental 
students are similar enough to the data for the control members 
of the matched pairs which are reported in Table V that no detailed 
discussion of Table V is included. In general, the magnitude of 
the correlation coefficients is in line with those for other 
simiiar situations, including the Interim Report of the present 



First Semester Sample 



COOP, September 1964-January 1965 



„ I b i J 1 i f the first in the series of tables in which, 
for each of the three measuring instruments, student performance 
is analyzed to show basic comparisons of test performance for 
persisting experimental and control students: within subgroups 

between beginning and final means, and between subgroups. 

means, standard deviations, r's, t's, and male and female 
comparisons are also displayed in these tables. 



Overall performance. The primary comparison in Table VI is 

^ e, ‘ h( “ an ® for the experimental subgroup and the control sub- 
* - P after the first semester of college (1964-65). On COOP the 
difference in means was 1.01-164.94 for the controls and 163.93 

*07 th ? ‘~ x P eri "’ enta i s • The correlation between the scores of the 
597 matched pairs of students was 0.52. The t -value of 3.273 ± a 

significant. It is noteworthy that a relatively small difference 
in means— just over one converted score point— is significant. 
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Theme Bating ^ __ 

January 1965 0.19 0.40 0.36 0.43 0.37 0.33 0.37 

Combination of Cooperative English Test: English Expression and College Entrance Examination Board English 

Composition Test t September scores<» See page 39* 
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» Significant at 0.05 level (two-tailed test). 




One of the factors in this result is the relatively large sample 
(N-597) which yielded a relatively small standard error (S.E.-0.31). 
Relationships between size of sample and size of standard error may 
be seen in data from the following tabulation which is calculated 
from data in the Interim Report which preceded the present report.^ 

Number of Difference Standard Error of 

Subgroups Matched Pairs in Means the Difference 

End-of-f irst 



semester 


166 


0.59 


0.51 


End-of -second 
semester 


113 


0.66 


0.75 


End-of-fourth 


semester 


31 


0.23 


1.28 



I In Table VI the experimental-control mean difference in 

January is shown as 1.01; the t-value is 3.273, which is significant. 
The standard error is 0.309. This indicates superiority for the 
control students — those who had received a semester of composition 

I instruction. 

S 

I These figures, when compared to those reported in a relevant 

| section of the Interim Report , indicate the value of relatively 

I large samples when differences between means are small. ^ In the 

j pilot study, with only 31 matched pairs, the difference in means 

\ on the COOP between the experimental subgroup and the control sub- 

group ir January of the first year was 3.00, the standard error 
| 1.13, and the t-ratio 2.65. Had the standard error for the 597 

| matched pairs in the current study been 1.13, the t-value would 

| have been 0.893, which would not have been significant, instead 

I of 3.273, which is significant. 

I This superiority for the control students over the experi- 

mental students was found to be about the same for males as for 
females. It will be seen from Table VI that the differences in 
January means were 1.06 for males, and 0.98 for females. Both are 
significant. 



i 

I 

° Interim Report . Table IX, page 36. 
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The lower portion of Table VI compares males and females 
of the experimental subgroup as well as males and females of the 
control subgroup. In January the females had mean scores about 
3.70 higher than the males in both the experimental and control 
subgroups. This superiority of the females is significant. 

It is instructive to examine the mean change scores from 
September to January. The mean gains varied from 1.64 to 2.84 
for the two principal subgroups and their male and female components. 
What is the meaning in terms of test performance of a change on 
the order of two COOP Converted Score points over a semester? 

If one works within the September distribution of test scores and 
their associated percentile ranks, it is evident that an increase 
of two Converted Score points may be achieved by an increase of 
three raw score points. Advances of this magnitude result in 
corresponding percentile rank increase of 10 or 11 for scores near 
the median and of 4 for scores having percentile ranks of 10 and 90. 

The most noteworthy finding presented in Table VI is the 
extent to which females as a subgroup excel males as a subgroup. 

The superiority of 3.77 or 3.69 is almost twice as large as the 
change in mean scores for the entire group during one semester 
of college. Thus at the beginning of the first semester, females 
possess a higher mean test score than the males do at the end of 
the first semester. 



Performance by ability quarters . In Table VII the emphasis 
is on performance of experimental students and control students 
at each of four ability levels. The ability levels were established 
on the basis of student performance in September 1964 on two tests: 
COOP and CEEB. A Z-score was obtained by combining derived standard 
scores for these tests.* The four ability levels were based on the 
4,136 students for whom Z-scores were available: the experimental 

pool plus the control pool for all five institutions. The fact 
that the highest quarter and the lowest quarter contain the smallest 
numbers of students (N*=126 and N*134) may be explained primarily by 
the difficulties in matching near the extremes of the distribution. 

Evidence presented in Table VII enables one to answer the 
following question: Was the superiority of the control students 

about equally present at all ability levels? There is considerable 
fluctuation, the second highest quarter showing the greatest control- 
experimental difference (1.91) and the third highest quarter least 



*See discussion of Z-score, page 39. 
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(0.31). However, at all four ability levels the January mean for 
the members of the control subgroup was higher than the mean of 
the experimental subgroup , 



This kind of review of the evidence by ability levels as 
well as for the overall subgroups will be reported for each of 
the first two semesters and for the full freshman year. 

Since for the total sample of 597 matched pairs the control 
mean was significantly higher than the experimental mean, it would 
be anticipated that there would be a statistically significant 
superiority in favor of the control subgroup at some of the ability 
levels. It was only in the second highest quarter that this was 
found. Such a straightforward analysis, although an over-simplification 
of what would be required in a thorough consideration cf relationships 
between main effects and interaction, is nevertheless useful in 
examining the extent to which findings for the various ability levels 
seemed to be consistent with the findings for the total subgroups. 




[ Performance by ability quarters by sex » Before proceeding 

\ to a consideration of the evidence for the two sexes at each of four 

\ ability levels, it is important to recognize a feature of the 

I situation which has a distinct bearing on the analysis. 
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A positive correlation exists between sex and performance on 
tests of English ability with females outperforming males (Tables VI 
and VIII). This results in the presence (both in the initial and 
the surviving matched pairs) of a larger female-male ratio in the top 
ability levels than in the bottom ability levels. In the top quarter 
(Table VIII) the ratio is 99 to 27 (3«6 to 1) and in the lowest 
quarter the ratio is 56 to 78 (0.72 to 1). The mean differences by 
quarters between females and males tend to be systematically smaller 
than the overall mean difference between the sexes. The reader may 
note this fact by comparing the mean difference in January COOP of 
3.77 between experimental females and experimental males overall 
(Table VI) with the mean female-male differences at the four ability 
levels as presented in Table VIII: 1,56, 1.38, 0*99, and 1.66. 

To arrive at the overall mean difference by using the mean differences 
at each of the four ability levels, it would' be necessary to employ a 
weighting procedure which took into account the fluctuation of the 
female-male ratios at the four ability levels. This characteristic 
of the data, while a proper reflection of the samples used, does 
introduce a complication in interpretation of the evidence presented 
in the quarters-by sex tables (Tables VIII, XI, XIV, and XXIX). 

Table VIII completes a series which reports COOP evidence for 
the first semester of the freshman year (1964-65),, The uniqueness 
of Table VIII is in the presentation of the facts for males and 
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females by ability level. In the highest quarter, within the 
experimental subgroup, the January mean for the 99 females was 
1.56 higher than the mean for the 27 males. This difference Is 
of about the order of the difference which prevailed on the 
September scores (1.87). Among the control students In the highest 
quarter, the females had a January mean which was 2.35 higher than 
the male mean. In September the difference had been 2.12. 

In addition to the top quarter, the lowest quarter also 
revealed noticeable superiority of females over males on the COOP. 
Within the experimental subgroup the mean difference In January 
was 1.66; within the control subgroup, 2.45. 

In the two middle quarters, there' was a consistent but 
smaller advantage In favor of the females on January scores. It 
Is noteworthy, however, that on the September scores, three of the 
four comparisons show the males to be slightly ahead of the females. 

The data In Table VIII Illustrate a fact which the Investi- 
gators have emphasized: females out-perform males as groups, 

appreciably and consistently, even at various ability levels. An 
Inspection of the basic data shows that only at the top edge of the 
distribution (the top 2 percent) do males equal or excel females. 

At the bottom edge of the distribution the reverse Is true; the male 
group falls below the females. 



CEEB, September 1964-January 1965 

Overall performance . Table IX shows that the number of 
matched pairs completing all September and January CEEB tests was 
597. The facts presented are: 

(1) Within experimental subgroup and within control subgroup: 
September mean and standard deviation, January mean and 
standard deviation, difference In means January minus 
September; correlation, and t-ratlo. 

(2) Between experimental subgroup and control subgroup: 
difference In September means, correlation, t-ratlo; 
difference In January means, correlation, t-ratlo. 

(3) Data described In (1) and (2), except for correlations, 
by sex . 

For the’ total experimental subgroup, the January mean was 29.37 
higher than the September mean (507.12 minus 477.75). This change 
In mean Is significant; the t-ratlo Is 10.354 with 596 degrees of 
freedom. For the total control subgroup, the mean gain was 30.47, 
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♦Significant at 0*05 level (two-tailed test) 






also significant (t-10.262, degrees of freedom, 596). Thus the j 

experimental students and the control students advanced about the 
same amount on CEEB during the fall semester, both gaining slgnl- j 

f leant ly. i 

An analysis of the January test. scores reveals that the : 

mean for the control subgroup is 1.76 points higher than the mean 
for the experimental subgroup. This difference on CEEB is not 

significant (t»0.506). The data regarding September means for j 

controls and experimental confirm the similarity achieved in the j 

matching process; the correlation of September CEEB scores between j 

members of matched pairs was 0.82. 

The middle portion of Table IX shows the performance of 
each sex in each of the two subgroups. For the males the mean 
gains during the semester were 32.53 (experimentals) and 34.32 
(controls) . The corresponding figures for females were 27.32 
(experimentals) ~and 27'. 98 (controls). Again, the evidence on 
CEEB for the first semester shows’ that the experimental treatment 
and the control* treatment were about equally efficacious in ? 

producing change in performance on the CEEB. Mean gains by males 
exceed mean gains by females by approximately five points. The 
t-ratios for January-minus-September means are of the order of 
7.00 for males* and for females. » 

j 

The lower portion of Table IX shows that within both the 
experimental subgroup and the control subgroup the mean for females 1 

exceeded the mean for males significantly at the beginning and 
also at the end of the first semester. The mean gains over the 
semester were slightly higher for the males than for the females — 
of the order of 33 points compared to 27 points. The superiority j 

of female performance over male performance at a given juncture is i 

of about' the same "magnitude as the superiority of an end— of —semes ter j 

mean over a beginning-of -semester mean for either of the sexes. 

In other words, even among beginning college freshmen, the females 
are about a semester ahead of the males on the CEEB as a measure of 
writing ability, and this difference remains at the end of the first 
semester. 

i 

j 

Performance by ability quarters . From Table X it is possible 
to determine whether the difference of 1.76 in overall means in 
January CEEB, favoring the 597 control students, resulted from a 
fairly uniform differential across the four ability levels. Such 
was not the case. For the lowest one-fourth of the students, the 
mean for the control subgroup was substantially higher than the 
mean for the experimental subgroup. For the three highest quarters 
of ability the means for the control subgroup were slightly lower 
than the means for the experimental subgroup. 
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THE PERFORMANCE OF 597 MATCHED PAIRS OF STUDENTS ON THE COLLEGE ENTRANCE EXAMINATION BOARD 
ENGLISH COMPOSITION TEST IN SEPTEMBER 1964 AND JANUARY 1965, BY QUARTERS: MEANS, 

STANDARD DEVIATIONS, AND t-RATIOS: COMBINED INSTITUTIONS 



<M 

o 

<0 B 

S3 

faS 

&n 

© 
•H 
• +» 

f>4 CO 

8,* 

.s a § M 

(9 fl 

0) J) *H 0) 

a s a p 

Q S3 

© >>H © 
U (4 O U 

,© a u © 

«H 3 -H «H 
ti fl <H 
- (0 © *rl 
QbUQ 



o 

•H 

_ *M* -H 

I© CO <9 

n ^ 

^ | M 

§j § 

© a +» © 

O <8 CU O 

a ►© © a 

© t© a> 

(4 • Wi 

© co w <y 

Cm (0 O Cm 
•H *H • H 




„!,.U .'!!>■ l.'" ... 1 1X-I.L! 1 '1,1: Wf 



I© 

CM 



O 

o> 

CM 

o 

CO 



0 

CM 

1 



t- 

t- 



s 

o 

o 

CO 



CO 

CO 

• 

0 

1 



s 



o> 

o 

05 

o 

s 

0 

05 

01 

• 

I© 

I 



CO 

3 



& 

t- 

CM 

l© 



00 

CO 



CO 

H* 

CM 



s 

o 

3 



CO 

03 



CM 

CM 



t- 

o> 

CM 



cfe 

CO 

o 



IO 

CM 

• 

o 



IO 

CO 



S Q 


CO 


05 


t- 






H 


CM 


• 1 


1 • 


• 


• 


l© Q 


CO 


CM 


rH 


CO • 
05 C© 


CO 


t- 


SO 


iH 


t- 


CO 


rH 




05 


1© 


o> 


• Cl 


1 * 


• 


• 


C cal 


05 


t- 


(O 


*? si 


CO 

1 W 


8 


CM 

IO 




t- 


CO 






«H 


o 


t- 


hi* *| 


1 • 


• 


• 


CO fl) 


CO 


1© 


iH 


05 • 


H|* 


H|* 




H C©| 


1 CO 


o 


CO 


• 


t- 


CO 


CM 


*H Cl 


• 


• 


• 


Q< <9 
<8*1 


CM 

CO 

l© 


3 

1© 


s 

i© 



CO 

CM 



CO 

CM 



6 

H 

<0 



& 

o> 



CM 

CO 



1© 

• 

CM 

CM 



CO 

CO 

g 

1© 

I© 

• 

CO 

CM 

I© 

t- 

CO 

• 

CM 

Hi* 

s 

s 

I© 

00 

t- 





a 




d 




a 


© 




© 




c 


B 


rH 


6 


»H 


o 


•H 


O 


•H 


O 


fa 


U 

© 


* 


U 

© 


u 

+» 


.© 


a 


d 


a 


e 


fl 


is 


o 


Ef 


o 


t© 


H 


o 


W 


o 



4 


* 


* 


* 


Hf 


fH 


00 


CM 






00 

00 


CM 

a> 


• 


• 


• 


• 


t- 


SO 


c- 


3 


fH 

CM 


3 


t- 

iH 


3 


• 


• 


• 


• 


O 


O 


o 


O 


fH 


1© 


00 


1© 


fH 


CM 


rH 


o 


• 


• 


• 


• 


fH 


CO 


SB 


o 


H|* 


CO 




so 


1© 


CM 


t- 


00 


CO 


o 


so 


so 


• 


• 


• 


• 


o 


05 




IO 


CO 


1© 


IO 


IO 


CM 


CO 


CO 


rH 


05 


05 


so 


• CO 


• 


• 


• 


• 


CM 


CO 


rH 


o> 


00 


t- 


CO 


3 


H* 


H|* 




00 


00 


SO 


so 


CO 


1© 


SO 


o 


• 


• 


• 


• 


CO 


CO 


CO 


3 


H|* 


* 


IO 


CM 


00 


so 


so 


00 


<0 




CM 


• 


• 


• 


• 


fH 


o 


CM 


o> 


HI* 


H* 


00 


00 


* 


Hi* 


CO 


CO 


05 

1© 


05 

1© 


3 


CO 


fH 


fH 


rH 


rH 


fH 




rH 




© 




« 




■H 




H-> 




fl 




a 





CO 

I© 



CO 

s 



t- 



CO 

t- 




o 

u 

c 

© 

o 



CO 

0) 

•& 

a 



1 H 
O X! S 

O 00 

•H 

SB 



<8 



•© 

u X e 
•n bO 
Jfl *H 
H 03 



>o e 

i 

.J 



* 

3 

CO 



I© 

CO 



t- 

CO 

o> 

CM 



CO 

00 



00 

CM 

fH 

• 

t- 

s 

3 

3 

£ 

t- 

Hi* 

fc- 

q> 

i© 

fH 

(0 

fl 



# 

CM 

CO 

CM 



CM 

CO 



t- 

H|* 

o 

CO 



3 

• 

CM 

00 

00 

00 

00 • 

S +» 

0) 

H © 

Hi* -H 



CO 

00 

o 

£ 

H* 

t- 

q> 

i© 



I 

I 

£ 



I * 



g 

c_> 



§• 

o 

d? 



a 

+» 

o 

EH 



•O 

4! 



I 

O 



s 

fl 

I© 

o 



<a 

fl 

co 

45 

5) 

6) 
•H 
O) 
♦ 



65 



Another noteworthy feature of Table X is in the column 
reporting difference in means, January minus September. It will be 
noted that in general these differences increase from the top 
ability level to the lowest ability level. For example, in the 
highest one-quarter the differences are 7.24 (experimental) and 2.97 
(control); in the lowest one-quarter the differences are 49.18 
(experimental) and 60.05 (control). This inverse relationship 
between mean gain and ability level, for both the experimental and 
control subgroups, is consistent with the data reported for COOP in 
Table VII, page 50, where the top one-quarter showed gains of 0.36 
(experimental) and 1.62 (control) while the lowest one-quarter showed 
gains of 3.66 (experimental) and 4.78 (control). In this kind of an 
analysis one must recognize the possibility that for a high ability 
level the increase of scores on a second testing may be restricted by 
test ceiling. Inspection of our raw data indicates, for example, that 
the top male on COOP in September had a perfect score, while the top 
three females were only three points away from perfection. Obviously, 
on a second testing, their room for improvement was slight. Further- 
more, regression may provide part of the explanation of the limited 
gains on the second testing. Conversely, for the lowest one-quarter, 
there is ample test ceiling and some increase in mean scores which is 
attributed to regression rather than to actual gains. 

Performance by ability quarters by sex . Table XI completes the 
presentation of data for CEEB for the first semester, 1964-65. This 
table corresponds to Table VIII for COOP, both showing comparisons 
between sexes within subgroups at each of four ability levels. The 
right-hand portion of Table XI shows the superiority (or inferiority) 
of means for females as compared to means for males. At each of the 
four ability levels there are two comparisons: experimental males 

with experimental females and control males with control females. Of 
the eight resulting comparisons in September, the one that is noteworthy 
is within the experimental subgroup in the lowest quarter: the mean 

for females was 28.39 higher than the mean for males. In the three 
highest quarters the female-male disparity was not conspicuously 
different from zero in terms of the scale on which standard scores are 
reported and favored the males as often as the females. ' 

An inspection of the male-female, comparisons in January suggests 
two observations: one is that in the lowest one-quarter, the experi- 

mental females maintained the significant superiority which they 
displayed in September; the other is that in the highest one-quarter, 
the control females showed a substantial (52.10) mean superiority in 
contrast to a near-zero (3.86) superiority in September. The marked 
superiority of the highest-quarter control females in January resulted 
more from a substantial negative change (-34.93) on the part of the 
males than from the modest (13.30) positive change over the semester by 
the females. 
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It is desirable to compare CEEB evidence in Tables IX, X, 

XI with the COOP evidence in Tables VI, VII, and VIII. These two 
tests were the ones which yielded September-to-January change 
scores (for the themes, it was not appropriate to compute change 
scores. See discussion page 42). Do students who have received a 
semester of instruction in freshman composition score higher as a 
group on these objective tests at the end of the semester than 
comparable students who have not received such instruction? The 
answer is "Yes" if one judges by COOP scores, "No" if one judges by 
CEEB. Do females and males benefit about equally from such 
instruction? The answer is "Yes" for both the COOP and the CEEB. 

Do students at four ability levels benefit about equally from such 
instruction? The answer appears to be "No" for both the COOP and 
the CEEB. 



Theme Rating, September 1964 and January 1965 

Overall performance . Table XII presents theme performance for 
the 597 matched pairs of students who completed the tests through 
the first semester (September 1964-January 1965). Theme performance 
is generally considered to be the most direct measure of writing. 
Theme evidence for this study is unique also in that the matching 
procedures used required that the experimental and control means in 
September be identical. Interpretation of January means of 
experimentals and controls is therefore free of the qualifications 
which would have been necessary had there been unequal September 
means and standard deviations. 

The N*s involved in Table XII are all reasonably large. The 
two largest subgroups are the 597 experimental students and the 597 
control students, that is, the 597 matched pairs. Data in the top 
portion of Table XII show that at the end of the first semester of 
college, the mean theme performance of the controls was 0.27 higher 
than that for the experimentals; the obtained t-value was 2.299, and 
siguif icant . Students receiving instruction such as that given in 
the first semester of freshman English composition performed signifi- 
cantly better on the theme than students who did not receive such 
instruction . The variable under investigation was the presence or 
absence of instruction in freshman composition. In this experiment, 
such instruction had a positive influence on student writing per- 
formance. 

It will be recalled that for the analyses on COOP and CEEB 
there was a significant advantage on the COOP for those who had 
received instruction, but not on the CEEB. 
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•^Significant at 0.05 level (two-tailed test). 



Although this situation prevails for both males and 
females, the middle portion of Table XII reveals that the overall 
superiority in means for controls is due more to scores made by 
males than to those made by females. In January, control males 
had a mean total theme score which was 0.45 higher than that for 
the experimental males, whereas the control females had an observed 
superiority of 0.15 over the experimental females. The 0.45 for 
males was associated with a significant t-value. Thus the 
generalization above regarding the positive effect on college 
freshmen of one semester of instruction in composition may be refined 
by saying that these results are more distinctly characteristic of 
the male students than of the female students. 

The bottom portion of Table XII compares the performance of 
males and females within the experimental subgroup and the control 
subgroup. These data relate to the question of whether, on this 
direct test of writing performance, the females perform better than 
the males. The mean difference in favor of the experimental females 
in September was 1.00. Similarly, the mean difference in favor of 
the control females was 1.00. These differences are significant. 

How, then, do males and females compare in theme scores after 
a semester of college? The females have increased their superiority 
over the males, the difference in mean performance being 1.52 within 
the experimental subgroup and 1.22 within the control subgroup. 

Change scores on themes cannot be legitimately computed, owing to the 
difference in topic and time of evaluation; therefore one cannot, as 
was possible on the COOP and CEEB, analyze the change in score between 
the beginning and the end of the semester. 

A number of comments are pertinent. In the fundamental com- 
parison — 597 experimental students against 597 control students on 
January theme — the obtained mean difference of 0.27 favoring the 
controls was significant. The 235 control males contributed more 
than did the 362 control females to the January finding of overall 
superiority of controls over experimentals. That is, control males 
surpassed experimental males (0.45) to a greater extent than did 
control females surpass experimental females (0.15). 

The magnitude of these various between-subgroup differences 
in theme score means may be thought of in terms of mean-dif ference- 
necessary-f or-signif icance of about 0.25 to 0.30. Thus one of the 
striking facts is the superiority of females over males, a condition 
which prevailed at the outset and is usually accentuated during the 
first semester of college attendance. There is a consistent tendency 
for the females to perform better than the males in group comparisons 
on all three criterion measures: COOP, CEEB, and Total Theme Rating. 
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The total sample of 597 matched pairs was composed of the 
following five institutional N’s: 



Institution 


Male Pairs 


Female Pairs 


Total 


1 


65 


135 


200 


2 


65 


62 


122 


3 


34 


90 


124 


4 


9 


11 


20 


5 


62 


64 


126 


Totals 


235 


362 


597 



It is helpful to examine the extent to which the findings for the 
combined Institutions were observed consistently among the five 
constituent institutions. In four of the five institutions, at 
the end of the first semester, the control subgroup mean exceeded 
the experimental subgroup mean. However, in only one of these 
four institutions was the mean difference, in favor of the controls, 
statistically significant. In all five of the institutions, the 
females surpassed the' males at the beginning of the freshman year, 
and the differential increased by the end of' the semester. 

Performance b£ ability quarters . Table XIII contains total 
theme means for the experimental students and the control students 
at each of four ability levels. Since the means in September were 
identical for 1 any matched pair 1 or subgroup of matched pairs, it is 
especially informative to look at the January means. It will be 
noticed that the control subgroup superiority of 0.27 for the total 
group (bottom portion of the table) resulted from the lowest quarter scores 
(mean difference of 0.67) and from the second high quarter scores (mean 
difference of 0.31). In the top quarter and the third high quarter 
the experimental and control students were essentially the same. 

Performance by ability quarters by sex . From Table XIV it is 
possible to see the mean January theme ratings of males and females 
within the experimental and within the control subgroups at each of 
four ability levels. These ability levels were established on the 
basis of a combination of September COOP and CEEB scores. Table XIV 
also contains additional descriptive information: the Z-score means 

which were used in establishing the four ability levels, and the 
means and differences in means on September theme ratings. 

The three columns at the right-hand side of Table XIV 
present the key comparisons. Generally in the eight comparisons, 
the mean of January theme ratings for the females was higher than 
for the males. The single exception was with the experimental 
students in the highest quarter, where the means for males was 0.08 
higher than the mean for females. Another way of summarizing the 
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facts portrayed in Table XIV is by noting that the superiority of 
females over males was most prominent in the two lowest quarters — 
and within those two quarters, more so within the experimental 
subgroup than within the control subgroup. 

The following tabulation shows, for the 597 matched pairs 
who completed the first semester, the number and percentage of 
males and females who had been in the upper half and in the lower 
half on the September ability distribution according to two deter- 
miners of ability levels: the Z-score, a combination of two objective 

tests; and Total Theme Rating. 

PROPORTION OF MALES AND FEMALES IN JANUARY WHO WERE IN UPPER 
HALF AND LOWER HALF OF SAMPLE IN SEPTEMBER 



Number in Men Women 

January Number Percent Number Percent 

On September Z-score: 



Upper Half 


304 


92 


30 


212 


70 


Lower Half 


293 


143 


49 


150 


51 


Upper + Lower 


597 


235 


39 


362 


61 


On September Theme Rating: 










Upper Half 


359 


117 


33 


242 


67 


Lower Half 


237 


117 


49 


120 


51 


Upper + Lower 


596 


234 


39 


362 


61 



The analyses by ability level thus far have all been based on 
the Z-score (objective-test based) ability levels. For the 597 
matched pairs who completed the first semester testing, 235 (39.4%) 
were males. Among the original set of 1,040 matched pairs in 
September the percentage of males was 40.6. It is evident that 
about the same proportion of males and females persisted through 
the first semester. It is noted that of the 235 persisting men, 

39.1 percent were in the upper half of the objective^test distribution, 
while 60.9 percent were in the lower half. For the females, the 
corresponding percentages were 58.6 in the upper half and 41.4 in 
the lower half. 



Persistence of matched pairs of students of the upper one- 
half in ability and the lower one— half in ability was' dependent upon 
the ability measure employed. For the Z-score measure of ability, 
persistence was approximately equal for the upper and lower halves 
(304, upper, 293, lower). For the essay measure of ability, persis- 
tence was greater for the upper half (359) than for the lower half 
(237). 
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If the persisting men are categorized in terms of the 
September theme distribution, it is noted that 50 percent were 
in the upper half and 50 percent in the lower half (117 in each) ; 
for the persisting women the percentages were 66.9 (242) and 33.1 
(120) . Thus among the 597 matched pairs who persisted through 
the first semester, the superiority of the female portion over 
the male portion in terms of September performance was slightly 
greater for the theme criterion than for the objective-test 
criterion. 

Table XV contains information concerning the 596 matched 
pairs of students whose performances were portrayed in the 
tabulation on page 74. From the data of Table XV it is possible 
to see the make-up of the sample by ability quarters as determined 
by September theme rating. Within each quarter there is an indi- 
cation of the number of males and females and the mean scores within 
each subgroup on September theme and on Z-score (two September 
objective tests combined). Since the ability quarters were 
established in terms of theme ratings (in contrast to Z-score on 
preceding tables) and since' matching between subgroups was perfect 
on sex and theme rating, the differences among ability levels are 
sharper in the "Theme" column than' in the "Z-score" column. The 
mean differences between adjacent quarters on theme for males were 
2.12, 1.96, and 2.19; and for females 2.26, 1.87, and 1.96. 



First Year Sample 



Table XV concluded a series of tables which presented 
the performance of th,? 597 pairs of students who completed the 
testing at the end of the first semester (September 1964-January 
1965). The next series of tables, beginning with Table XVI and 
extending through Table XXXIII, will present data for the 365 
matched pairs who completed the entire 1964-65 academic year, and 
were tested in May 1965. These students constitute a subgroup of 
the 597 matched pairs whose performance has just been summarized. 
These 365 matched pairs of students were tested in May as well as 
in September and January. 



COOP, First Semester 

Overall performance . Table XVI presents COOP performance 
of the 365 matched pairs at the beginning and at the end of the 
first semester, 1964-65. Four main comparisons are given: between 

subgroups in September; within subgroups, September to January; 



TABLE XV 



THE 596 MATCHED PAIRS WHO PERSISTED THROUGH JANUARY 196 5 J MEANS ON 
SEPTEMBER THEME AND SEPTEMBER Z -SCORE 1 BY SEX BY SUBGROUP BY 
ABILITY LEVEL ESTABLISHED ON SEPTEMBER THEME RATING 











Mean, Sept. 


Mean, Sept. 


Quarter 


N 


Sex 


Subgroup 


Theme 


Z- Sc ore 


Highest l/4 


43 


Male 


Exper imental 


11.78 


104.81 


If 


43 


^ ff 


Control 


11.78 


104.46 


If 


114 


Female 


Experimental 


11.84 


109.61 


ff 


114 


II 


Control 


11.84 


109.69 


2nd High l/4 


74 


Male 


Experimental 


9.55 


98.12 


If 


74 


it 


Control 


9.55 


98.12 


. ff 


128 


Female 


Experimental 


9.58 


104.48 


ff 


128 


If 


Control 


9.58 


104.66 


3rd High l/4 


69 


Male 


Experimental 


7.59 


90.75 


ff 


69 


it 


Control 


7.59 


90.91 


ff 


96 


Female 


Experimental 


7.71 


98.11 


If 


96 


If 


Control 


7.71 


98.08 


Lowest l/4 


48 


Male 


Experimental 


5.40 


84.00 


ff 


48 


it 


Control 


5.40 


83.83 


ff 


24 


Female 


Experimental 


5.75 


91.00 


ff 


24 


it 


Control 


5.75 


90.75 


Total Group 


234 


Male 


Experimental 


8.54 


94.40 


ff 


234 


tt 


Control 


8.54 


94.36 


If 


362 


Female 


Experimental 


9.54 


103.51 


ff 


362 


If 


Control 


9.54 


103.58 



Combination of Cooperative English Test; English Expression and College 
Entrance Examination Board English Composition Test, September scores. 

See page 39. 
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between subgroups in January; between the male and female portions 
of these subgroups. Table XVI may be compared with Table VI, 
which presents COOP data for the 597 matched pairs who completed 
the January 1965 testing. In both tables the means for January show 
a superiority for the control subgroup over the experimental sub- 
group. For the 597, the superiority was 1.01, and for the 365, the 
superiority was 1.25. A noticeable variation between males and 
females in the two samples was that, for the 365 matched pairs in 
January (Table XVI) , the control males were not substantially 
superior to the experimental males, whereas for the 597 matched 
pairs (Table VI), the control males were significantly superior 
to the experimental males in January. 

One of the functions of this comparison of partially over- 
lapping samples is to permit inferences regarding the nature of the 
matched pairs who were lost from the experiment between January 1965 
and May 1965. Except for the control males, who constituted only 
about 37 percent of the control subgroup in Table XVI, the two 
samples, 365 pairs and 597 pairs, are very similar. Consequently, 
the 232 pairs who vanished over the second semester apparently were 
not appreciably different from the 365 matched pairs who remained. 

To put it differently, the loss of the 232 matched pairs apparently 
did not alter the representativeness of the remaining sample. 

Performance by ability quarters . Table XVII, as compared to 
Table XVI, shows the experimental-control comparisons at each of 
four ability levels. A picture of the gains on COOP during the first 
semester is presented in the column headed "Difference in Means, 
January minus September." As was the case for the parent sample of 
597 matched pairs (Table VII, page 59), the semester gains among the 
sample of 365 matched pairs was relatively large in the lowest quarter 
of ability. The 80 experimental students in the lowest one-quarter 
shewed a mean gain of 3.80; the 80 control students in the same 
quarter had a mean gain of 5.34. These changes may be compared to 
1.91 and 2.97 for the total group of 365 matched pairs. 

For the overall sample of 365 pairs, the January COOP means 
showed a superiority of 1.25 for the control subgroup. This overall 
mean difference resulted from the following mean differences at the 
four ability levels: 0.46 (highest), 2.16, 0.31, 1.75 (lowest). The 

differences for the second and the fourth quarters are significant. 
However, since there is no definite trend, the mean differences by 
ability level should be interpreted cautiously. 
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CEEB, First Semester 



Overall p erformance . Table XVIII presents the same com- 
parisons of CEEB data for the first semester that Table XVI gave 
of COOP data for the 365 matched pairs surviving in May 1965. 

For the experimental subgroup and the control subgroup the 
mean gains on CEEB, September to January, are of the order of 30 
standard rating points — 30.14 for the experimental and 28.72 for 
the controls. These gains are significant. Similarly, the analysis 
by sex shows significant mean gains for males and for females. An 
inspection of the January minus September column of Table XVIII 
shows that the control males had the smallest mean gain, 20.52. 

The key comparison is between experimental mean and control 
mean subgroups in January. The two means were very close in January, 
as they had been in September: the January difference of -2.79 is 

not significant. The comparison of subgroups by sex also falls to 
reveal any significant mean differences. The correlations of CEEB 
scores within subgroups and between subgroups are in line with 
expectation. The September-January comparisons yield related r's 
of the order of 0.60. The between-subgroup r's, also related r's, 
were approximately 0.80 in September and 0.40 in January. 

The bottom portion of Table XVIII compares CEEB means for 
males and females within the experimental subgroup and within the 
control subgroup. The superiority of the females was about 26 
standard rating points in September and increased to 31.40 within 
the experimental subgroup and 38.96 within the control subgroup in 
January, both differences being significant. 

There was substantial agreement in January between the 
findings for the 365 matched pairs available for the first two 
semesters (Table XVIII) and the 597 matched pairs available at the 
end of the first semester. Table IX, page 63). 

Performance by ability quarters. From Table XIX, containing 
CEEB data for the first semester by ability levels, it is evident 
that mean gain3 were largest in the two lowest quarters. For the 
larger sample (597 — Table X, page 65) the semester mean gain was on 
the order of 30 standard rating points. Table XIX shows that for 
the 365 pairs, the mean gains were 20 standard rating points or less 
for students in the highest two quarters in ability, whereas for 
the students in the lower two quarters of ability, the four mean 
gains were 49.77 (experimental), 33.80 (control); and 40.72 
(experimental), 62.10 (control). 
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^■Combination of Cooperative English Tests English Expression and College Entrance Examination Board English 
Composition Test, September scores. See page 39. 

* Significant at 0.05 level (two-tailed test). 







The comparison between the 365 experimental and the 365 
controls at the end of the first semester shows two interesting 
j facts about the lowest one-quarter: it was the only one-quarter 

in which the control mean excelled the experimental mean, and this 
f was the only one of the four quarters at which the between-subgroups 

difference is significant. 

t 

1 * 

In summary, these CEE8 data for the first semester of the 
365 matched pairs indicate overall similarity between the performance 
| of the experimental and the control subgroups. On end -of -semester 

| comparisons between subgroups, the control subgroup excelled 

| significantly in the lowest quarter, while the experimental subgroup 

had a slight but not significant advantage in the upper three quarters. 
The two lowest quarters showed greater gain than the two upper 
quarters. 



Theme Rating, First Semester 



Overall performance . The data in Table XX, covering the 
theme ratings for the first semester for the 365 matched pairs sur- 
viving in May 1965, may be related to the first semester data for 
the 597 matched pairs surviving in January 1965 (Table XII, page 69). 
This is helpful in considering the extent to which the matched pairs 
finishing the full freshman year are representative of the matched 
pairs completing only the first semester of the freshman year. The 
first two lines in Table XX show data for 365 matched pairs who were 
included among the 597 matched pairs depicted in the first two lines 
of Table XII. It may be noted in Table XX that in September the mean 
theme rating for the smaller group (N»365) was 9.23, whereas the 
corresponding figure for the larger subgroup (N*597) was 9.15. The 
persistence rate for males was slightly less than for females: 

57.0 percent and 63.8 percent, respectively. Such evidence of a 
slight selective factor in this kind of longitudinal study would be 
expected. 



It was noted, from Table XII, that at the end of the first 
semester the 597 control students had a mean theme rating which was 
0.27 higher than that of the mean for the 597 experimental students, 
and that this difference was significant. In Table XX it is shown 
that for the 365 matched pairs, the mean difference on January theme 
in favor of the controls was 0.26. Coupled with a relatively large 
standard error, this mean difference is not quite significant. 



In general, however, the analysis by sex for the 365 matched 
pairs (Table XX) yielded results which were consistent with those 
for the 597 matched pairs (Table XII). In both tables, the 
September means for the females were higher than those for the males. 
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^Significant at 0.05 level (two-tailed test) 



and in faoth c ase s, the January means showed an even greater superi- 
or y or the females* That is, the initial difference did not 
disappear, as it is sometimes averred to do. 



P erformance b^ ability quarters . Data in Table XXI show how 

HO °y era11 me f n difference of 0.26 on total theme rating in January 

? ont !: ol; 9 - 95 » experimental) was apportioned among the four 

but in y fhr? ls ’ - The 0Vera11 mean difference is not quite significant, 
ut Jo the lowest quarter the observed mean difference of 0.69 is 

a gnif leant. However, the fact that the observed mean difference 
tor the second quarter approached significance suggests the com- 
p exity of control-experimental comparisons by ability level. 



in T*M I W ?vi nSt n UCti r ? CtS are included among the auxiliary evidence 
in Table XXI. One of these facts is the correlations between the 

i ^ ating ® of Se Ptember and January, within subgroups within 
ability levels, and overall. Within each of the ability levels, the 
s are low, typically in the 0.20's and 0.30's. The overall r's 
re 0.44 and 0.34. The relative smallness of all of the r's may be 
due in part to the relative unreliability of theme ratings (see dis- 
cuss on of reliability, page 43). The second fact concerns the 
influence cf range of talent on the magnitude of correlation 

coefficients; the r's within the ability levels tend to be lower 
than the overall r's. 



COOP, Second Semester 



Overall performance. Table XXII is the first in a series of 
tabies showing data for the second semester of the freshman year. 

i.i n J h i 8 .? erle8 ° f tables involve the same 365 matched pairs 
dealt with in Tables XVI through XXI. 



fr wi llr TN thC COlUmn " Difference ^ Means, May minus January" 

i i/J lt J 1 ! SCen that 8ains on C00P durin 8 the semester were 
between 1-1/2 and 2 converted score points, and that these gains are 

l significant. Gains during the semester were roughly similar for 
experimentals and controls, and for males and females. 



The control subgroup began the second semester with a 
significant mean superiority over the experimentals (1.25) and ended 
the semester with a smaller, but still significant, mean difference 
over the experimentals (0.79). The breakdown of the dan by sex 
shows that, among males, the controls began the semester 0.54 
higher than the experimentals and ended the semester 0.28 higher- on 
neither of the two occasions is the difference significant. For* the 

mp ^i 8 ’ ^controls had a significantly higher mean than the experi- 
mentals at the beginning (1.66) and at the end (1.10). 
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English Composition Test, September scores. See page 39 
•^Significant at 0.05 level (two-tailed test). 
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Significant at 0.05 level (two-tailed test) 



The recurring fact is that the greatest differences in means 
on COOP are found in comparisons between females and males, regard- 
less of whether they are experimentals or controls. Within the 
experimental subgroup, the superiority of the mean for females over 
the mean for males was 2.75 in January and 2.95 in May. The corres- 
ponding figures within the control subgroup were 3.88 and 3.77. 

All four of these differences in favor of the females are significant. 

Performance by ability quarters . Data in Table XXIII enable 
the reader to identify any student ability levels at which the 
January-to-May gains on COOP are noteworthy. The lowest quarter of 
ability showed a relatively high mean gain for the experimentals 
(3.21) and a relatively low mean gain for the controls (1.09). At 
the beginning of the second semester, the mean of the lowest one- 
quarter of the control subgroup was 1.73 higher than the corresponding 
experimental subgroup mean. At the end of the second semester, the 
observed difference between these two lowest-quarter subgroups was 
considerably less, 0.40. 

The second highest quarter of ability is unique in that both 
the January and the May between-subgroup differences favoring the 
controls are significant (2.16 and 1.23 respectively). The mean 
difference in May was 0.93 smaller than the mean difference in 
January. 

It was only in the third high quarter that the control 
subgroup superiority was greater in May (1.41) than it had been in 
January (0.31); the 1.41 mean difference is not significant. 

Thus, in general, the second semester of instruction did not 
maintain the superiority of the controls over the experimentals. 

During the second semester, the experimentals out-gained the controls, 
who continued to receive instruction. 

In the analysis by ability level, the investigators were on 
the alert for patterns of performance which would have implications 
for curriculum and instruction. Thus if a noteworthy fact emerged 
for the top one-half or the bottom one-half, or the top quarter or 
the bottom quarter, one might see a possible application of such 
findings to exemption or sectioning. With COOP data for the second 
semester, no clear-cut pattern emerged. The lower half yielded two 
facts which illustrate the absence of a pattern: the relatively 

large gains by the controls of the third highest one-quarter and 
the experimentals of the lowest one-quarter. The investigators wonder 
whether such facts as those for this sample are chance findings or 
whether they are indicative of the actual situation for the 
population sampled. 
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CEEB, Second Semester 



Overall performance . Table XXIV contains second-semester 
CEEB data of the 365 matched pairs surviving in May 1965. Comparable 
CEEB data of these same pairs for the first semester are in Table 
XVIII, page 81 The within-semester analysis shows that during the 
first semester the mean gains by experimentals (30.14) and by controls 
(28.72) were very similar. During the second semester (Table XXIV) 
two noteworthy facts emerge: the gains were smaller than for the 

first semester and the controls outperformed the experimentals 12.74 
to 2.65. Thus, for this sample, the increment for one year occurred 
primarily in the first semester — for the experimentals almost all of 
it and for the controls about two-thirds of it. All of this is 
reflected in the column showing t-ratio. During the first semester 
(Table XVIII) both the experimental subgroup and the control subgroup 
made highly significant gains, whereas during the second semester 
(Table XXIV) only the controls gained significantly. 

For the 365 pairs of students, end-of-semester differences 
between means of the experimental subgroup and the control subgroup 
were not significant. At the end of the first semester the subgroup 
means differed by 2.79 favoring the experimentals; at the end of the 
second semester, they differed by 7.30 favoring the controls. The 
obtained mean difference of 7.30 standard rating points is associated 
with a t-value of 1.640, short of significance at the 0.05 level. 

For 364 degrees of freedom a t of 1.97 is required for significance. 
Under the given conditions of variability and correlation, an observed 
mean difference of 8.86 would be a significant difference. For 
example, had the control subgroup mean been 525.80 instead of 524.28, 
the difference would have attained significance. 

What would be required to have a mean difference of 8.86 
standard rating points? The CEEB has 100 to 110 items, depending 
on the form. An increase of one raw score point is typically 
associated with an increase of about six standard rating points. 

Thus if, on the average, members of one of these two subgroups had 
made one or two more correct responses than did their counterparts 
in the other subgroup, the resulting subgroup means would have 
differed significantly. 

A reference distribution of student scores may serve as a 
vehicle for additional thinking about the within-subgroups and 
between-subgroups analyses presented in Tables IX, XVIII, and XXIV. 

The following tabulation shows, for each of six mean standard 
ratings, two for each testing period, the percentile rank which each 
such rating had in a distribution of September CEEB scores (N=4,159) 
and in a distribution of May CEEB scores (N=730). 
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^•Significant at 0,05 level (two-tailed test) 



Subgroup 



Percentile Rank in a Distri- 
Rounded Mean bution of Student Scores in 

Standard Rating September 1964 and May 1965 



Experimentals 


(Sept. 1964) 


485 


54+ 


38 


Controls 


II 


484 


54 


37 


Experimentals 


(Jan. 1965) 


514 


65 


50 


Controls 


II 


511 


64 


49 


Experimentals 


(May 1965) 


517 


67 


51 


Controls 




524 


68 


55 



This kind of analysis also emphasizes the similarity of the 
observed means of experimentals and controls. In a distribution of 
student scores, the corresponding percentile rank differences were 
typically 1. For example, the January means of 514 and 511 corre- 
spond to September percentile ranks (among individual scores) of 65 
and 64, respectively. The mean gains during the semester may be 
examined in a similar manner. If the typical end-of-semester score 
is placed in a beginning-of -semester distribution of student scores, 
it appears that the improvement is 10 or 11 percentile rank points. 
This finding — only a modest increment in performance as a result of 
an additional semester or year of instruction plus maturation — is 
illustrative of a rather general condition which is perhaps not 
sufficiently appreciated by teachers of English. 

The present investigators had previously acknowledged this 
aspect of year-by-year instruction in curricular areas common to 
consecutive school levels. 8, 9 

Performance by ability quarters . One function of Table XXV 
is to look beyond mean second semester gains of 2.65 (non-significant) 
standard rating points by the experimental subgroup and 12.74 
(significant) by the control subgroup through examining second 
semester CEEB data by ability levels. At none of the four levels 



O 

Jewell, Ross M. , et al. , Final Report of the Communication 
Experiment Conducted by the Department of Languages , Speech , and 
Literature of the Iowa State Teachers College , 1955-58. (Cedar Falls, 
Iowa, 1960). 

^Jewell, Ross M., and Gordon J. Rhum, The Relative Effective- 
ness of Two Methods of Instruction in College Freshman Composition : 
Closed -Circuit Television and " Normal" Classroom . (Cedar Falls, Iowa: 
State College of Iowa, 1966). 
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did the experimental students have a mean gain which was significant. 
Progress within the highest quarter and lowest quarter was moderate; 
however, in the middle quarters there was a slight loss. Somewhat 
the reverse pattern evolved for the control subgroup. There were 
strong gains in the two middle quarters (19.97 and 25. 54), both 
significant, a slight gain in the top quarter, and a slight loss in 
the lowest quarter, neither significant. 

The principal comparisons are between experimental subgroups 
in May. The mean difference of 7.30, favoring the controls, was not 
significant. Only at the second ability level was the control sub- 
group mean significantly higher than the experimental subgroup mean: 
18.83, t-ratio of 2.260. At the highest quarter, the experimental 
subgroup mean surpassed the control subgroup mean (606.59 compared 
to 591.05; t-ratio of 1.714). 



Theme Rating, Second Semester 

Overall performance . Table XXVI shows the performance on 
theme in January and in May 1965 by the 365 matched pairs of students 
in the First Year Group. Analysis of theme data is limited to a 
comparison of the experimental subgroup and the control subgroup on 
each testing occasion. The basic fact in Table XXVI is that neither 
in January nor in May was there a significant mean difference between 
the two subgroups. The control subgroup was somewhat ahead — the 
mean differences were 0.26 (January) and 0.13 (May). Within each 
subgroup, the mean for females was higher than the mean for males. 

For experimentals , the mean differences were 1.33 in January and 0.52 
in May. For controls, the corresponding figures were 1.01 and 0.58. 

It appears that during the second semester the males, although still 
significantly lower as a group, had narrowed the gap considerably. 

Performance by ability quarters . One of the main functions 
of the analysis by ability levels is to see whether the overall 
analyses do, in fact, mask important characteristics present at 
one or more ability levels. The facts for the four ability levels 
presented in Table XXVII indicate that none of these levels is associ- 
ated in any special way with the overall mean superiority of 0.13 
held by the control over the experimental subgroup. Only in the 
highest quarter was the experimental mean higher than the control 
mean (by 0.40). None of the differences by ability level were sig- 
nificant . 
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^Combination of Cooperative English Test: English Expression and College Entrance Examination Board 

English Composition Test, September scores* See page 39. 



COOP, First Year 



Overall performance . Tables XXVIII and XXIX present data 
for the full freshman year: September 1964 through May 1965. 

These tables correspond to the series of tables for the first 
semester (Tables XVI to XXI) and the series for the second semester 
(Tables XXII to XXVII). Such a first semester, second semester, 
and combined semester report is based on the performance of the 
365 matched pairs of students in the First Year Group. 

In this section, attention will be focused upon gains for 
the nine-month period. On COOP, the gains in mean converted score, 
reported in Table XXVIII, September to May, were 3.90 (experimental) 
and 4.50 (control). For the COOP, the mean gain by the experi- 
mentals was accounted for about equally by each of the two semesters 
(1.91, first semester; 1.99, second semester).* For the controls, 
about two-thirds of the gain for the year occurred during the first 
semester (2.97 for the first semester, 1.53 for the second semester). 
Gains for each subgroup were significant each semester. 

The September-to-May gains for the two sexes were similar: 
the smallest mean gain, 3.77, was by the experimental females; the 
largest mean gain, 4.63, was by the control females. Both of these 
gains were significant. 

A recurring, striking aspect of the findings, the consistent 
superiority of female means over male means, is seen in the 
approximately three-point mean superiority in both September and 
May. The reader may note from Table XXVIII that this differential 
is approximately three-fourths as large as the mean freshman year 
gain on COOP (3.90 and 4.50). The question, How far are the males 
behind the females on COOP? might be answered by determining at 
what deferred testing date the COOP mean for a representative group 
of male students would be likely to equal the September COOP mean 
for a representative group of female students. The answer: 
apparently about seven months after September, or about March of the 
freshman year. To put it another way, the males lag behind the 
females by an interval of about seven months. 

Performance by ability quarters by sex . Table XXIX, like 
Tables VIII, XI, and XIV (pages 61, 67, and 73) contains the most 
complete analysis made of the data by the investigators. It is a 
breakdown, by ability quarters, subgroup, and sex. The column 
headed "COOP, May minus September Mean" shows that September to May 



*See Table XVI, page 77, and Table XXII, page 87, for these 

data. 
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^Significant at 0.05 level (two-tailed test). 



" ’iw - ' 



0 



w 







Cw 


1 0- 








TJ 


! t- 


t- 






o 










*H 


CD 


& 








CD 


00 






C3 


O 


00 




0) 


Crf! 


• 


• 




rH 


I 


o 


04 


5 


n 

S 


4^ 










• I 


Tt 4 


l> 




co 


Cm 


H 


t> 


W 'Z. 


3 


>5 Cm 


• 


• 


CO o 


G 


(0 *H 


O 


CO 


w 


• -H S a 


I 






e 


o 






H 


OJ 


•H 


rH 


oo 


CO M 


fH 




fH 


04 


BS 


n 

6 




tn 

• 


04 

• 


< ’Z 


<u 


0) 1 


o 


fH 


CU H 




n -h 










e 






« P 


•t ♦* 


Q • 


04 


t> 


M W 




H> Ch 


in 


CO 


S z 


o- 


a Cm 


• 


• 


O H 
H g 


o 

o 


0) *H 

cn Q 


o 


H 


S O 




<Ml 


CD fH 


CD fH 


O 




'Gl 


fH CD 


fH CD 


te 










•• 




O 


V 

O 04 


O' 

CO 00 


O 






04 O 


fH fH 


M W 




03 


O 00 




CO > 




Pi 


• • 


• • 


CO W 




1 


CM Tfl 


fH tH 



co 



o o 

cm o 





o 


X 


p 


K 


o 


6 


CO 


CO 


««> 


co a> 


w 


1 


cu 


3 -H 




04 


8 


C CL 
•H 0 



o o 



ogws 



o tH 
O CM 

• • 

CO CM 



> 

G 

o 

o 



CO 



co in 

CD rH 

• • 

in 



CO 04 



0) 

> 

CD 

J 



-H 

10 

o 

JC 

bO Ttf 
•rl Ns 
ffi r-l 



04 

3 

o 



b) 



*c 

3 

CO 




cd in 
co co 



o o 



00 00 

tH in 



CO 



o 

co Ch 

« • 



in 04 





0) 




o cn 


CD 


Tt 4 


P- 




c 


• • 


• 


• 


>s O 


o 


P3 


rH CO 


fH 


in 


<3 o 


a 


<D 


t> t- 


0- 


tH 


X o 


cn 


X 


H rH 


iH 


rH 








th in 


in 


Kf 1 


• 




• 


rH O 


o 


04 


> 




Q 


• • 


• 


• 


G 






rH rH 


CO 


Tj 4 


O 




CO 








o 






to in 


t- 




• 


a> 




o to 


Tt 4 


00 


<Xi 


u 


Gj 


1 * • 


• 


• 


CL O 


o 


n 


rH rH 


o 


rH 


0) O 


o 


<D 


t> t- 


tH 


t- 


vJ U l/l 


sl 


I rH H 


fH 


rH 








rH lO 


CD 


5J 4 






•1 


1 rH rH 


00 


fH 






Q 


• o 


• 


• 






• I 


H 4 t- 


CO 


tH 


a> 




tn| 








(h 


to 




Tf fH 


00 




• O 


G 




04 t- 


00 


in 


-P o 


•H 


G 


• • 


• 


• 


a, co 


•H 


<fj 


CD 04 


00 


04 


« i 


(0 


0) 


H 04 


fH 


CM 


to M f*j 


S 


fH fH 


fH 


rH 








1 t- 04 


tH 


04 








fH CD 


tH 


CD 



<D 



r3 

fH 



CD 

H 4 

CM 



Tt 4 

CM 



o 



LO 

in 

CV) 



o 



CO 



o 

I 



rH CO 
H 4 tH 



ijt >Jc 

Ttf 
CO tH 
tH CD 

• • 

Ttf CO 



co cd 

lO V}< 



o o 



CO CO 
CO CO 

• • 

CO CO 



rf CO 
r-l lO 

• • 

in 



O Ttf 

^tf co 



tw ^ 

co co 



04 tH 
fH rtf 

• • 

CO CO 



tH fH 

in ^ 

• • 

CO CD 



oo in 

tH CO 

• • 

CO CO 



CO tH 
00 CD 
• • 

in to 
o o 



CO Tf 
rH t- 



s 



Tt 4 



04 



04 

M 4 



04 



in 

CO 

to 



o 



o 

Tt 4 

o 



fH CO 

^ ch 



iii * r ‘ : 

<to co 



00 CD 
0- CD 



04 tH 



tH tH 
04 CO 

• • 

o o 



00 o 
in 

• • 



04 



tH 00 
04 CD 
• * 
LO «tf 



H 4 CC 
04 CD 

• • 

tr~ cd 

CD CD 



in oo 

CD tH 

• • 

CO CO 



CD CD 
tH fH 

• • 

^ in 

CD CD 



LO 00 
00 Ttf 
• • 

CO CO 



CO fH 
OO fH 
• • 

LO CD 

Q Q 



04 Ttf 
H 4 tH 



s: 

’d 

C 

04 s: fH 



0) 



99 



00 


00 


CO 


CO 


oo 




00 


00 


o> 


00 

o 


CD 

CO 


§ 


• 

CM 


• 

o 


A 


tH 


CD 


o 




CO 


CD 


• 


• 


• 


CM 


o 


rH 


CD 


00 


CO 


CC 


rH 


o 


CO 


tH 


CM 


• 


• 


• 


o 


o 


rH 


00 


TJ 4 


tH 


tH 


CO 


CM 


• 


• 


• 


o 


o 


rH 




i 




o oo 


o oo 


co in 


CO ID 


co in 


T* 4 CO 


cb uft 


!*• O. 

V r,H 

CD ^ 


* * 

CO TJ 4 


cc o 


00 CM 


00 CD 


CD O 


00 00 


CM CD 


• • 


• » 


• • 


CM CO 


CO tH 


00 CD 


in t> 


in rn 


ID rH 


rt O 


CM rH 


r* 4 CM 


• • 


• • 


• • 


o o 


O O 


o o 


rH CM 


CM CD 


CO CD 


co en 


CO CO 


00 fH 


• a 


• • 


• • 


O'! CO 


lO 


CD tH 


CO rH 

t~ tn 


o oo 


H 4 


CM H 4 


tH CM 


• • 


• • 


• • 


in co 


LO tH 


in to 


cm a> 

U0 to 


CO CM 
O Tt 4 


84 

44 



CD 



CD 



CO CO 



04 



- - o 

LO CD 
• • 

^ CO 
CD CD 



fH CD 

co in 



Ttf CO 



CD 

04 



O 

o 



Tt 4 H 4 
CD CD 



fH CD 

co in 



"O 

u 

CO 



x: 

60 Ttf 
•H \ 
K fH 



CD 


fH 


cu 


fH 


a> 


fH 


0) CO 


rH 


a) co 


rH 


0) €0 


rH E 


0J (0 


fH E 


<D CO 


rH E 


CO 0 ) 


rH B 


CO <D 


rH B 


co a) 


S JH 


co a> 


S IH 


(0 0) 


SS fH 


• • 


X 


• • 


X fH 


• • 


4-> H> 


• • 


-H -H 


• • 


CL CL 


G G 


cl a, 


G G 


CL CL 


& 3 


o o 
o o 


33 


o c 
o o 


cS £ 



00 

fH 



CD 

04 

% 

O 



CD 

CO 



CD 

tH 

04 

% 

H 



CD 

CD 



co in 

rj< CO 



CM rH 
rH t~ 
CD CD 
• • 
rD m 



to in 
cm o 

• • 

o o 



CD CO 

fH in 
• • 

tH in 



co 

CD CD 

• • 

in tji 



o 

o 



CD 

CO 



tH «tf 



tH Tf 



CD 

in 



00 04 
tH 00 



in 04 

fH CD 
• * 
CD 04 
t- 00 



CD 
Ttf CO 



Ttf CO 
Ttf CO 



•c 

w 

•H 



to 

c 

w 



*0 

u 

co 

o 

« 



c 

0 

•H 

H-> 

CO 

G 

•H 

1 
£ 



<D 

O 

c 

CO 

u 

H-* 

W 



04 

to 

0) 



o 

o 



xs 

c 

CO 



G 

o 

•H 

CO 

CO 

<D 

U 

a. 



3 



• 


• • 


• • 


• • 


JC 


H 4 


in lo 


C 73 rH 


o o 


CO 


CD 


CO CO 


in co 


CO CO 


•H • 


tH 


rH rH 


rH rH 


rH rH 


rH 04 










to CO 

rH 


in 


CO CO 


CO o 


CM CD 


w & 


tH 


tH LO 


rt* 05 


CM CO 


• 


• « 


• • 


• • 


(0 


CO 


Tt 4 CO 


H 4 T} 4 


in co 


•• CL 

(0 


00 


fH IH 


oo in 


rtf co 


•H 0) 


CD 


tH O 


CD CM 


00 00 


cn a) 


• 


• • 


o • 


» • 


0) CO 


o 

CD 


o o 

CO CD 


CM X* 

in in 


CM Tl 4 

in in 


Fh 


rH 


rH rH 


fH rH 


rH fH 


•C 0 
(0 (0 


fH 


CO tH 


H 4 CD 


00 CD 


*H 0) 


LO 


rH LO 


tH CO 


CD CO 


rH L 


• 


• • 


• • 


• • 


bO O 



c o 
W w 



4 > (-4 
> 0 ) 
•H JO 
+* 6 
<a <d 
U -H 
<u a. 






o 

o 







-H 




o 


r* 






CO 






H-> 






0) 


£ 


Cm 


CO 










o 


a> 


£ 










H 






HJ rH 




G 






0) 




0) 


O 


G 




rH 


a) 


rH 


•H 


O 


CD 


n 


rH 


a> co 




•H 


rH 


S 


a> co 


rH S 


03 


H-> 


n 


a> 


fH E 


co a> 


G 


•H 


S 


IH 


(0 a) 


5S 


•H 


10 


• 


* 


S fH 


• • 




O 


-H 




• • 


H-» H-* 


s 


a 


G 


G 


CL CL 


G G 


o 


E 


O 

O 


o 

o 




o o 
o co 


o 

rH 


o 

o 




CO 

4> 



X3 

0) 



•H 

CO 



A 

* 



0) 

> 

0) 



in 

o 

• 

o 



CO 



C 

(0 

a 

•H 

Cm 

•H 

£> 

•H 

CO 

.i. 






mmamm 






gains were largest in the lowest one-quarter, and that these 
relatively large gains were noted for both males and females, and 
for both experiment als and controls. That is, at this point, 
magnitude of gains was more a function of ability level than of 
treatment or sex. In the lowest quarter, the differences between 
COOP means in May and in September were of the order of six converted 
score points, compared to overall mean differences of about four 
converted score points. 



CEEB, First Year 



Overall performance . Table XXX shows that mean changes on 
CEEB during the freshman year were 32.79 and -41.47, broken down as 
follows: 

Subgroup First Semester* Second Semester* * Full Year 

Experimental 30.14 2.65 32.79 

Control 28.72 12.74 41.47 

Thus the increment for the year was attributable primarily to the 
first semester. Therefore, the progress reflected in the analysis 
for the full year was not much greater than the progress achieved by 
the end of the first semester. It appears that a fair amount of 
change occurred during the first, semester whether instruction was 
present or not, while even the modest change during the second semester 
required the presence of instruction. No explanation for this reduced 
rate of gain during the second semester suggests itself to the investi- 
gators. In the discussion of Table XXVIII, page 98, the investigators 
pointed out that the COOP data showed a limited tendency for the change 
during the first semester to be greater than the change during the 
second semester. 

Performance by ability quarters by sex . Table XXX indicates 
that for ability levels and sexes combined, the mean gain by the 
experimentals was 32.79 and by the controls 41.47. In Table XXXI, the 
same September-to-May CEEB data are presented by ability levels by sex. 
It is realized that as smaller groups are used for analysis, the 
resulting evidence is less reliable. The intention is to identify 
any suggestions of underlying facts which may be obscured by the larger 



*See Table XVIII, page 81. 

**See Table XXIV, page 91. 
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picture. The column headed "CEEB, May minus September Mean” 
shows that the largest gains occurred in the two lowest ability 
quarters, that these gains were of similar magnitude for males 
and females, experimentals and controls, and that they were all 
significant. These mean advances varied from 46.59 (experimental 
females, third quarter) to 63.63 (control females, third quarter). 

Even though there were only 17 control males in the top 
one-quarter and the results are therefore subject to sampling 
error, it is nevertheless noted that for these 17 males, there 
was a mean decrease in CEEB performance of 35.76 (t=1.815, not 
significant) from September to May. When the investigators 
analyzed the performance of this group by semesters, they dis- 
covered that the mean change during the first semester was 
-40.00, and during the second semester +4.23.* These 17 males 
happened to represent only three participating institutions. 

Two of the institutions had no control males in the top one- 
quarter who performed on all three testing occasions. 



Theme Rating, First Year 

Overall performance . According to Table XXXII, at the end 
of one full year of college the mean theme rating for the 365 
control students was 9.76, and for the 365 experimental students, 
9.63. The difference of 0.13 was not significant; a t-test for 
related measures yields a ratio of 0.864. When the analysis was 
made for the sexes separately, the results resembled those for the 
combined sexes (see lines 3, 4, 5, and 6 of Table XXXII). In the 
lower portion of Table XXXII is the comparison of theme ratings 
by sex. The findings for the experimental subgroup are similar to 
those for the control subgroup. At the outset (September 1964) 
the theme mean for the females was 0.63 higher than that for the 
males and at the end of the freshman year, the superiority was 0.52 
for the experimental females and 0.58 for the control females. All 
these differences are significant. The fact that on the theme, the 
superiority of females over males was approximately the same at the 
end of the freshman year as it had been at the beginning of the 
year may be examined on the basis of the facts f^r each of the two 
semesters. A semester-by-semester analysis shows that at the 
beginning of the first semester the female mean theme score was 
0.63 higher than that of the males both for the experimentals and 
for the controls. At the end of the first semester, a comparison 
of male and female means indicates the experimental female mean 



*The analysis for the first and second semesters for these 
17 males is derived from the computer output; it is not presented 
in any of the tables included in this report. 
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Significant at 0.05 level (two-tailed test) 



was 1.33 higher than that of the experimental males, while the 
control female mean was 1.01 above that of the control males.* 

Thus, midway in the freshman year, the females exhibited a 
greater superiority over the males than they did at either the 
beginning or the end of the freshman year. It therefore follows 
from all this that on the testing at the end of the second semester 
the males performed better relative to females than at the end of 
the first semester. 

It was pointed out above that the obtained mean difference 
of 0.13 between experimental subgroup and control subgroup (Table 
XXXII) was not significant. How large a difference in means on 
total theme rating in May would be significant? For the given 
conditions of variability and correlation, it is possible to 
develop an estimate. The standard error of the mean difference 
is 0.146. Thus an obtained difference in means of 0.29 would be 
required for significance. A mean difference of 0.29 would have 
been present had 58 of the 365 control students received a rating 
one point higher than their actual rating. 

The broadness of the scoring units for total theme rating 
may be noted in this connection. The distribution (N*l,978) of 
total theme ratings for both matched and unmatched students who 
wrote in May 1965 had a median of about 10. That is, a total theme 
rating of 10 had a percentile rank of 50. Near the center of this 
distribution, a shift of 1 point in theme rating is associated with 
a shift of 15 points in percentile rank: 



Total Theme Rating 



Percentile Rank 



11 

10 

9 



65 

50 

35 



Total Theme Rating 



September 1964 



May 1965 



N Mean S . D 



N Mean S . D 



4,147 9.03 2.59 



1,978 9.52 2.41 



*See Table XX, page 84. 




Performance by ability quarters by sex . Table XXXIII 
shows the theme evidence for September 1964 and May 1965, by 
subgroup, by sex, by quarter. In general, there was a direct 
relationship between quarters of ability (as derived from scores 
on two objective tests In September) and mean theme ratings In 
May. The right-hand portion of the table Indicates the difference 
In means between females and males by subgroup by ability level 
In May. At the top of the ability distribution, the male means were 
higher than the female means — on the order of 0.40. At the center 
and lower part of the ability distribution, the female means were 
higher than the male means — from 0.42 to 0.83. None of the dif- 
ferences In means for females and males were significant; this is 
In part related to the fact that the N’s were relatively small. 



Two Year Sample 



COOP Performance, First Semester 



Overall performance . Table XXXIV contains COOP data based 
on the 122 matched pairs of students who completed the entire testing 
program. Performance is here reported over their first semester, 
September 1964-January 1965. The control subgroup made a greater 
gain than did the experimental subgroup (controls-2.59; experimentals 
1.88), though the difference In performance between the two subgroups 
at the end of the semester (1.12 in favor of the controls) was not 
quite large enough to attain slgnlf ica.ee (attained t-1.956, 1.98 
required for significance when N-122, degrees of freedom-121). 



Performance by sex . In the middle portion of Table XXXIV the 
data are compared by sex: male experimentals compared with male 

controls and female experimentals with female controls. The males 
In the two subgroups attained about the same mean gains and end-of- 
semester means. The female controls, however, ended the semester 
significantly superior to the female experimentals (difference of 
1.49, t-2.036) . 

The lower portion of Table XXXIV compares males with females 
within each subgroup. Here the females show a significant superi- 
ority over the males In both the experimental and the control 
subgroups at the beginning of the semester, that Is, In September 
1964 (3.65 for experimentals, 2.79 for controls). At the end of 
the first semester, the difference between males and females In the 
experimental subgroup (2.24) Is no longer significant, owing to the 
greater gain of the males during the semester (male experimentals 
2.77, female 1.37 — see middle portion of the table). The female 
controls maintained their significant superiority over the male 
controls at the end of the first semester (3.26), having gained 
slightly more than the males (2.76 to 2.30 — middle of the table). 
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'‘'Combination of Cooperative English Test: English Expression and College Entrance Examination Board English 

Composition Test, September scores. See page 39. 
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COOP Performance, Second Semeoter 



Overall performance . Table XXXV shows the second semester 
COOP results for the same 122 matched pairs whose first semester 
COOP performance was reported In the preceding table. The overall 
results are similar to those of the first semester. Both experl- 
mentals and controls made significant gains during the second 
semester (2.59 experimental , t-5.041; 2.32 controls, t*4.939) as 
they had in the first semester. The difference in subgroup mean 
of 0.85 in favor of the controls at the end of the second semester 
was not significant (attained t»1.506, 1.98 required for signifi- 
cance with 121 degrees of freedom). 

Performance by sex . Comparison between males and females in 
each of the two subgroups is presented in the middle and lower 
portions of Table XXXV. The gains made by both sexes in each sub- 
group on COOP during the second semester were significant (middle 
portion of table). Within the experimental subgroup, the differences 
in means in favor of the females (about two points ahead in both 
January and May) were not significant. Within the control subgroup, 
the difference in favor of the females (about three points ahead in 
both January and May) was significant. 



COOP Performance, First Year 

Overall performance . Table XXXVI, containing COOP results 
for the 122 matched pairs who completed the testing for the project, 
shows performance over the nine months from September 1964 to May 
1965. The gains for both the experimental subgroup and the control 
subgroup (4.47 and 4.91) are significant. The mean difference 
between the performance of the two subgroups on the May testing, 0.85 
in favor of the controls, is associated with a t-value of 1.506, 
with 1.98 required for significance. In short, the 122 members of 
the experimental subgroup performed substantially the same on COOP 
as their control counterparts at the beginning and at the end of 
their freshman year, despite the absence of instruction in freshman 
composition for the experimental students. 

Performance by sex . Essentially the same picture emerges in 
subgroup comparison by sex. The experimental males made a slightly 
greater gain than the control males over the nine months: 5.50 and 

4.80. The between-subgroups difference in favor of the male controls 
declined accordingly, from 0.93 at the beginning of the fall semester 
to 0.25 at the end of the spring semester. The rounding of decimals 
causes the apparent discrepancy. In the female pairs, on the other 
hand, the controls increased their advantage, moving from a superiority 
of 0.10 in September to a difference of 1.19 in May. All of these 
May differences were significant. 
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THE PERFORMANCE OF 122 MATCHED PAIRS OF STUDENTS ON THE CGOPHfcATIYE ENGLISH TESTS x BIGLI5H EXPRESSION IN 
SEPTEMBER 1964 AND MAT 1965; INCLUDING DIFFERENCES IN MEANS, STANDARD DEVIATIONS, 
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It is instructive to arrange the 
to May, in order of size. 



four mean gains, September 



Experimental Males 


5.50 




Control Females 


4.97 




Control Males 


4.80 




Experimental Females 


3.88 





It is clear that the absence of instruction did not handicap the 
males on COOP, and that the experimental females exhibited the 
smallest gain. The noteworthy fact is that because of the greater 
gain of the males within the experimental subgroup, the mean of the 
females was significantly higher than that of the males in September 
(3.65), but not in May (2.03). Within the control subgroup, the 
female mean exceeded the male mean significantly both in September 
and in May. 



COOP Performance. Second Year 

Overall performance . Table XXXVII presents the performance 
on COOP of the 122 persisting matched pairs who completed the May 
1966 testing and compares their performance on that occasion with 
their performance in May 1965. The top two lines report the per- 
formance of the 122 experimentals and the 122 matched controls. 

Each subgroup performed essentially the same way in May 1966 as in 
May 1965 (mean declines of 0.67 for experimentals, 0.25 for 
controls). Those having had instruction did not perform signifi- 
cantly better on COOP than those not having instruction at the end 
of either the first year or the second year. 

Performance by sex . Dividing the subgroups by sex and exami- 
ning the results reveals that the males of both subgroups were 
similar in their May 1965 performance (mean differences of 0.25). 

On the May 1966 testing the control males showed a significant 
< superiority (mean difference of 3.75). This difference results from 
a decline in performance by the experimental males (-2.07) coupled 
with a gain by the control males (1.43). 

The bottom portion of Table XXXVII contains the array of 
COOP scores, May 1965 and May 1966, which shows most clearly the 
relationships between males and females within experimental sub- 
group and within control subgroup. In May 1965, the female mean 
was 2+ points more than the male mean in both subgroups , though 
this difference was significant only in the control subgroup. 
However, in May 1966 the experimental female superiority was 4+ 
points and the control female superiority almost disappeared. 
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This situation reflects these main, facts: female subgroups 

generally surpass male subgroups; between the two Mays the experi- 
mental males and the control females lost ground. In spite of the 
fact that control males were significantly superior to experimental 
males and that experimental females were significantly superior to 
experimental males in May 1966, the general finding for the 
sophomore year is an absence of mean gain for both the experimental 
and the control subgroups. 



COOP Performance. Two Academic Years 

Overall performance . Performance at the beginning of the 
freshman year (September 1964) and at the end of the sophomore year 
(May 1966) on COOP for the 122. matched pairs completing all project 
tests is the subject of Table XXXVIII » Over the span of two 
academic years, the experimental and control subgroups show similar 
significant gains: 3.80 for the experimentals (t»4.960), 4.66 

for the controls (t™y.053). A t of 1.98 is sufficient for sig- 
nificance, degrees of freedom=121 0 The control subgroup started 
slightly higher (0.41) than the experimental subgroup and gained 
slightly more (0.86), but the end-of-two-year-difference (1.28) 
between subgroups is not significant (discrepancy is the result of 
rounding). As at the end of the first year and at the end' of the 
second year, there was no significant difference in COOP scores 
between those who had composition instruction and those who did 
not when the two years are taken together. 

Performance by sex . The control males out-gained the experi- 
mental males over the two academic years, 6.23 to 3.43. This dif- 
ference in gain was great enough to provide the male controls a May 
1966 mean significantly higher than the male experimental mean 
(difference of 3.75, attained t«2 0 645, 2.02 needed for significance 
with 43 degrees of freedom) . 

The female experimentals and controls had essentially the 
same mean gains — 4.00 and 3.78. This similarity in mean performance 
by the female experimentals and controls existed both in September 
1964 and May 1966. 

The lower part of Table XXXVIII shows relationships of the 
sexes within each subgroup. The females were significantly 
superior to the males in September 1964: experimentals 3.65 and 

controls 2.79. In May 1966 the experimental females continued to 
show a significant superiority over the males (4.21) but the 
control females did not maintain a significant superiority over 
the control males (mean May 1966 dif ference»0.34) . Within the 
control subgroup the two-year mean gain by the males was 6.23, and 
by the females 3.78. 



114 






O 

ERIC 



a 

PQ 

s 





Cm 




O 




w E 


2 


<u 0 


M 


0) T3 


S 


tn <U 
a> 


O 


a> w. 


H 

CO 


p 



e 



o 



ia" 

H H CO 



to 



r— I 
CM 
iH 



C 

•H 



(0 

C (0 
(0 3 H 
0) C «J 
<D X -H -p 

o EC 
C CC Q> 
#CH 6 
U 05 O 
OH 

*H >4 C 

££8 



in 

in 

co 



+•1 



•H 

<u 

OJ 



M 



Tt* 

CM 



o 



Ch 

Cm 

H 

o 



00 

CM 




05 

00 

05 



G5 

t- 



<H 



$ 

o 

co 

05 



c6 

in 

o 



00 



<u 

o 

c 

05 

u 

o 



a 

05 



W CO ^ 
C CO CO 
10 05 
-OH 

p -H S 



(0 
3 
>3 c 
(0 *H 

E 



t- 

s 

* 

o 



o 

CO 



CO 

CO 



00 






(0 

<u 

E-t 



to 05 

G $H 

fa O 
o 

<u in 
> 

•H *0 
44 0 ) 
(0 +* 
w. u 

05 05 
CL > 

o e 
o o 
o o 



CO « 
CO 00 
05 
iH 

c 

>* <0 
ss 



Q 

Jh 

<u • 

yQ CTi 

6 

fl) _ 

+> c 
a. co 

5$ 






^1 



00 



in 

CO 



CM 



CM 



in 

t- 



oo 



CO 

05 

o 



iH 

t- 



in 

05 



6 

00 



00 



CO 

CO 



00 



00 

CM 



00 



CO 



CL 

3 



O 

fa 



X3 

3 

CO 



115 



£ 



in 

05 

o 



o 



CM 

iH 



O 

I 



CO 

iH 

CM 



CM 

00 



o 

o 



£ 



o 

eg 



oc 

00 



* 

CO 

CM 

T* 



CO 



00 



00 

CO 



CM 

00 

• 

o 



o 

in 



o 

o 






00 



00 


C5 


00 


CO 


05 


G5 


00 


in 




iH 


• 

00 


• 

t- 


• 

05 


• 

t- 


• 

00 




CM 


in 


0 


CO 


CM 


in 


in 


CO 


t- 


• 

t- 


• 

00 


• 


• 

oo 


• 

00 


CO 

«H 


co* 

iH 




a 


CO 

iH 



CM 



00 



CO 



CO 

CO 

iH 



in 


00 


O 


CM 


t- 




CM 


CM 


00 


CM 


• 

t- 


• 

t- 


• 

t- 


• 

t- 


• 

t- 




in 


iH 


t- 


CO 


• 

00 


00 

• 

00 


iH 

• 

iH 


0 

• 

CM 


t- 

• 


co 


CO 


CO 


CO 


CO 


iH 


iH 


iH 


iH 


tH 


CM 


CM 






00 


CM 

rH 


CM 

iH 






t- 



CO 

CO 



CO 



CO 

00 



CO 

iH 



oo 

t- 



(0 

G 




CO 


0) 

05 


05 

05 

iH 






05 


iH 


(0 


s 


«H 


iH 


(0 


E 


•H 


O 


(0 


s 


05 


Is 

05 


G 


S 

• 


• 


fa 

• 


Ow 


G 


CL 


G 


a. 


£ 


O 

O 




O 

O 



CO 

0) 



ca 

E 

«> 

fa 



G 

O 

O 



Ch 

C 



CO B 
05 O 
05 TJ 
0) 
0) 



fc 
«S£ 



o 

CM 



05 

iH 



CO 

CO 

05 



s* 

CO CO 

3 g 

•H 
G s 
•H 

0) 



Cm CO 
E 

CO *H m 

sot X, 



co 

C* 

I 



00 






CM 



Cm 

«m 

a 



CM 



a> 



co 

G 

CO 

* 



n 



Cl 

05 G 
•O *H 

E 

Q> # • 
+* Cw 
Cu Ch 
05 *H 
CO tt 



CO 

3 

G 

•H 

E 



05 



CO 

E 

P 

fa 



O 

•H 



CO 

« 

I 



<& 



T!« 

CD 



CM 



• 

Ch 

Ch 

H 

Q 



in 

co 



00 



o 

CM 

tH 



O 

00 

CM 



o 



£ 

CO 

o 



CM 



£ 



CM 



CO 

CD 



•o 

0) 



CO 



<*> 

* 



CO 44 



o 

Is 



G 

O 

o 



2 

0) 



in 

o 



10 



a 

<0 

o 

•H 

Ch 

•H 



B 



•H 

to 

a. 






-i 











1 



Summary 

Tables XXXIV through XXXVIII have presented COOP data for 122 
matched pairs for four segments of the first two years of college 
and for the full two-year period. The general evidence concerning 
mean COOP gains is summarized in the following chart: 



First Semester 




Second Semester 


Exp. 

Cent. GAIN 




Exp. 

Cent. GAIN 


First Academic Year 


Experimentals 

Controls 


GAIN 



Second Academic Year 



Experimentals 
Controls 



NO GAIN 



i 



Experimentals 

Controls 



GAIN 



There was a no-gain situation only during the sophomore year interval. 



The main comparisons in this study deal with experimental and 
control subgroups. The evidence indicates that there was no sig- 
nificant difference between the scores on COOP of those who had had 
instruction in composition during the freshman year and those who had 
not had such instruction. 



CEEB Performance, First Semester 

Overall performance . Table XXXIX is the first in a series of 
five tables presenting the performance on CEEB of the 122 matched 
pairs who completed the entire testing program for the project. Table 
XXXIX concerns the performance at the beginning and at the end 
of the first semester in college, September 1964, and January 1965. 

The top two lines of Table XXXIX present the performance of the 
entire group of 122 pairs. Both the experimental subgroup and the 
control subgroup showed a significant gain in performance during 
the semester . The improvement of the control subgroup over the 
semester was not as great as that of the experimental subgroup 
(23.22 control, 34.86 experimental), with the result that in January 
the control mean was 15.20 less than the experimental mean, accounted 
for by the September disadvantage (3.56) and the smaller gain 
(11.64). This shows that of the group completing two years those 
not having composition instruction in the first semester scored 
significantly higher on CEEB at the end of that semester than did 
those having such instruction. 
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^Significant at 0.05 level (two-tailed test) 



Performance by sex . Significant gains were made during the 
first semester by the experimental males (31.82), the experimental 
females (36.58), and the control females (31.63). The gain of the 
control males (8.32) was not significant. The smaller gain by 
control males resulted in a significant superiority on January per- 
formance in favor of the experimental males (dif ference*31.43, 
t»2.233). Though the experimental females gained more than the 
control females (36.58 to 31.63), the difference on January means 
(6.04) was not significant. 

Comparison of the performance of male and female members 
of the same subgroup shows a similar conclusion: that the females 

were somewhat, but not significantly, superior to the males in 
both subgroups in September (14.84 for experlmentals, 21.69 for 
controls). When January comparisons are made, experimental and 
control females have increased their superiority over the corre- 
sponding males, the experlmentals by 4.77 and the controls by 
23.31. The superiority of control females over control males in 
January (45.00) was significant. That is, instruction in composi- 
tion had a significantly greater effect on CEEB scores for women 
than for men at the end of the first semester. 



CEEB Performance, Second Semester 

Overall performance . Table XL is the second in the series 
presenting performance on CEEB of the 122 matched pairs completing 
the full project testing program. Data in Table XL are for the 
second semester, January 1965 and May 1965, test administrations. 

The experimental subgroup started the semester 15.20 points higher 
than the control subgroup, gained almost nothing during the semester, 
and ended the semester 4.31 points lower than the control subgroup. 

In contrast, the control subgroup showed a significant mean gain of 
19.54. At the end of the second semester, there was no significant 
difference in the overall performance on CEEB between those who 
had had composition and those who had not. 

Performance by sex . The mean gain of 19.54 by the control 
subgroup resulted from a mean gain of 22.93 by the 44 males and 
17.63 by the 78 females, both of these being significant. The mean 
gain of 0.03 by- the experimental subgroup resulted from a mean gain 
of 7.00 by the males and a mean loss of 3.90 by the females. The 
significant superiority of the experimental males over the control 
males on CEEB in January was lost in May, though the difference 
(15.50) was still in favor of the experlmentals. The control 
females gained slightly more than the control males, although 
not enough to produce a significant difference between them. 
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Comparison of the performance of males and females within 
subgroups consistently shows females superior to males, with 
significant superiority for control females both in January 
(45.00 points, t-2.685) and in May (39.70 points, t-2.732). 



CEEB Performance, First Year 

Third in the series reporting performance on CEEB for the 
122 matched pairs completing the full project testing program is 
Table XLI, which gives the facts for the September 1964 and May 
1965 administrations of CEEB, the beginning and the end of the 
students' freshman year In college. 

Overall performance . Examination of the data indicates 
that the experimental subgroup began the year with a slight (3.56) 
advantage. Both subgroups made significant gains over the testing 
period, the experimental improving by 34.89 points (t*5.507) and 
the controls by 42.76 points (t*6.934). Both gains were highly 
significant. The difference between the two subgroups at the end 
of the second semester was slight (4.31), indicating that instruction 
in composition had no significant effect on CEEB scores at the end 
of the year. 

Performance by sex . Within the male group, the experimental 
had a slightly higher mean gain than the controls: 38.82 to 31.25. 

Within the female group, the controls had a higher mean gain than the 
experimentals : 49.26 to 32.68. All four gains were significant. 

At the end of the first full year, the experimental males scored 
higher on CEEB than the- control males (15.50), and the control 
females scored higher than the experimental females by an almost 
identical amount (15.49). 

Comparison by sex within subgroups shows that while the 
females in both subgroups were superior to the males, the experi- 
mental females were not significantly so (8.71); the control 
females were significantly superior (39.70). 



CEEB Performance, Second Year 

Overall performance . Table XLII presents the performance 
on CEEbT in May 1965 and May 1966, of the 122 matched pairs who 
completed the entire testing program. In examining this table it 
is important to remember that, during this second year of the study 
(1965-66), neither the experimental subgroup nor the control sub- 
group received instruction in freshman composition. During this 
twelve-month period each subgroup displayed about the same improve- 
ment, and in each case the improvement was significant. It is 
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noted that the mean CEEB gains during the sophomore year, on the 
order of 20 points, were about half as large as the mean gains 
during the freshman year (Table XLI) . As in previous between- 
subgroup comparisons on CEEB, neither subgroup displayed a 
significant superiority on the May 1966 testing. 

Performance by sex . The comparisons of males and females are 
presented, as in previous tables, in two ways. In the central por- 
tion of the table there is a be tween-subgroups comparison of males 
and females. Both experimental and control males improved signi- 
ficantly on CEEB during the second year; the males in the experi- 
mental subgroup performed somewhat better than the males in the 
control subgroup on both occasions. In contrast, the females of 
the control subgroup whose gain during the sophomore year was not 
significant, performed better both Mays than the females in the 
experimental subgroup, whose gain was significant. None of these 
mean differences between subgroups for the males or females on 
either testing occasion were significant. 

In comparing males and females of a given subgroup, a 
noticeable dissimilarity between the subgroups is apparent. Within 
the experimental subgroup, the two sexes performed in essentially 
the same way in May 1965 and in May 1966, though the females dis- 
played a slight advantage. Within the control subgroup, however, 
the females displayed a strong advantage on both occasions. The 
control females exhibited the highest mean scores of any of the 
four subgroups. 



CEEB Performance. Two Academic Years 

Overall performance . Table XLIII compares performance 
on CEEB of the 122 matched pairs who completed project testing 
through May 1966. The period between tests is four semesters, 
from September 1964 to May 1966— the beginning of the freshman 
year to the conclusion of the sophomore year. During this period 
both the experimental and control subgroups made significant 
improvement on CEEB. Both the experimental and control subgroups 
had beginning freshman means in the 490's, had two-year gains of 
about 58 points, and thus had ending sophomore means in the 550's. 
The gains were significant. Between-subgroup mean differences 
were not significant; composition instruction had no significant 
effect on CEEB scores at the end of the second year. 

Performance by sex . When the evidence is analyzed by sex 
by subgroup, mean gains during the two years were roughly the same 
for males as for females. These four gains varied from 51.23 for 
control males to 63,05 for control females, all significant. Dif- 
ferences between control and experimental males, slightly in favor 
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of the experiment als both in September 1964 and May 1966 (7.93 
and 17.84), were not significant. Differences between the female 
experimentals and controls were slight. 

When male-female performance within subgroups is examined, 
the females in each subgroup are found to be superior to the males, 
the control females showing a significant superiority of 
33.51 points (t*2.374) on their May 1966 means, the experimental 
females showing a non-significant superiority of 9.54» 



Summary 



Tables XXXIX through XLIII have presented data for 122 
matched pairs of students on CEEB at each of four testing points 
in the first two years of college and for the full two-year period. 

The general data showing gains on CEEB are summarized 'in the following 
chart : ) 



First Semester Second Semester 



Exp. 




Exp. 


Cont. GAIN 




Cont. GAIN 



First Academic Year Second Academic Year 

Experimentals 
Controls GAIN 



Experimentals 
Controls ^AIN 



Two Academic Years 



1 Experimentals 


1 


| Controls 


GAIN 1 



As the chart indicates, gain occurred between each set of testing 
occasions. Over no segment of time did the students in either sub- 
group fail to make some gain in performance on CEEB. 



Theme Performance, First Semester 

Theme performance for the 122 matched pairs available 
through the first two college years is presented in Tables XLIV 
through XLVIII. The data for the first semester appear in Table 
XLIV. Each table contains facts about theme ratings at the 
beginning and at the end of the given interval. Analysis of theme 
ratings is between subgroups for each testing occasion, and not 
w ^hin subgroups between two testing occasions. As has been 
pointed out previously, it is not meaningful to Investigate change 
in theme performance over a specified interval (see page 42). 
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Overall performance . In September 1964 the experimentals 
and the controls had Identical means — 9.31, because perfect matching 
on theme score rating was required. At the end of the first 
semester, the control mean was 0.22 higher than the experimental 
mean; this difference was not significant. 

The correlation data are of interest. The matching pro- 
cedures produced an r of 1.00 between the September theme ratings 
of the 122 experimentals and their 122 matched controls. The r at 
the end of the first semester was 0.29. This substantial decrease 
in the coefficient of correlation stems from the following: the 

relatively small number of rating values (the rating scale ranged 
from 2 to 18), the factors of unevenness in achievement within the 
matched pairs, and the unreliability present in the theme ratings. 
For the two objective tests, COOP and CEEB, the be tween-subgroup 
correlations were, in September, lower than 1.00, but in January 
higher than 0.29. (Additional discussions of correlation data will 
be found on page 50.) 

Performance by sex . The analyses by sex show that in Janu- 
ary the mean for control males was higher (0.36), though not 
significantly higher, than that of experimental males. Control 
females were even less superior (0.12) to experimental females. In 
September the mean for females was 0.38 higher than the mean for 
males in both subgroups. In January, the differences were 0.99 for 
the experimental subgroup and 0.76 for the control subgroup. Of 
these four female-male mean differences, only the 0.99 was sig- 
nificant. 



Theme Performance, Second Semester 



Overall performance . Table XLV contains the end-of-f irst- 
semester theme data which were reported in the preceding Table 
(XLIV) and also the end-of -second-semes ter theme data for the 122 
pairs who finished the full project testing program. At the end 
of the freshman year, the mean theme ratings for the experimental 
subgroup and the control subgroup were only 0.06 apart (9.93 
experimental, 9.87 control). That is, as measured in this project, 
theme ratings of students finishing two years of college who had 
completed freshman composition were not significantly different at 
the end of two semesters than theme ratings of students who had had 
no freshman composition. ' 

Performance by sex . The experimental-control similarity was 
present for both the males (mean difference of 0.20 in favor of the 
experimentals) and the females (mean difference of 0.00). The 
disparity between the mean for females and the mean for males was 
less at the end of the second semester than at the beginning of the 
semester. Within the experimental subgroup, the mean difference, 
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females minus males, dropped from 0.99 (which was significant), to 
0.37 (which was nor significant) ; within the control subgroup, from 
0.76 to 0.57 (neither significant). Thus there is a suggestion 
that at the end of the second semester the males were in a better 
position relative to the females than they had been at the 
beginning of that semester. 



Theme Performance, First Year 

Overall performance. The September and May data for the 
1964-65 year in Table XLVI complete the analysis of theme per- 
formance of the first two semesters. The key fact is that the 
experimental subgroup and the control subgroup, which had identical 
means in September and a differential of only 0.22 in the middle of 
the year (Table XLV) , also had near-identical means in May (dif- 
ference of 0.06 in favor of experimentals) . 

Performance by sex . There was also close agreement between 
the September data and the May data as regards the superiority of 
females over males: 0.38 for both in September, 0.37 (experimentals) 

and 0.57 (controls) in May. None of these differences were signifi- 
cant (approximately 0.80 would be required for significance). 



Theme Performance, Second Year 

Overall performance . Table XLVII covers the sophomore year. 

It is unique in that it represents the period during which neither 
the experimental nor the control subgroup received formal instruction 
in freshman composition. In May 1966 the experimental mean (9.70) 
was 0.31 higher than the control mean (9.39), a difference too small 
for significance, though slightly larger than the mean difference at 
the end of the first year (0.06). 

Performance by sex . The analysis by sex 3 ?ielded results 
similar to those of the total group, the experimental males scoring 
0.43 over control males, experimental females 0.24 over control 
females. Corresponding mean differences at the end of the first 
year had been 0.20 and 0.00. The most interesting disclosure of 
Table XLVII is that the experimental males, who were 0.37 behind 
the experimental females at the end of the freshman year, were 0.11 
ahead of the experimental females at the end of the sophomore year, 
and that the superiority of the control females over the control 
males dropped from 0.57 at the end of the freshman year to 0.08 at 
the end of the sophomore year. 
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How, then, did the mean theme ratings at the end of the 
sophomore year compare with those at the end of the freshman year? 
first, those who had no freshman composition performed on both 
occasions about as well as those who did have freshman composition. 
Second, the slight superiority of females over males on theme 
rating, evident at the end of the freshman year (and. on nearly all 
objective tests at all testing sessions — see Tables XXXIV-XXXVIII 
for COOP, Tables XXXIX-XLIII for CEEB) , did not exist at the end of 
the sophomore year. 



Theme Performance, Two Academic Years 

Table XLVIII depicts theme performance at the beginning and 
at the end of the first two college years. Analyses of theme data 
are always between groups as of a specified date and never within 
groups between two specified dates. 

Overall performance . The data in this table are very 
similar to those of Table XLVII, the data for May 1966 being 
identical. By May 1966 the experimental subgroup scored a little 
higher (0.31), though not significantly higher than the control 
subgroup; both subgroups had the same mean in September 1964. 

Performance by sex . In September 1964 males and females 
had identical mean theme ratings within subgroups (experimentals 
each 9.07, controls each 9.45). In May 1966 experimental males 
exceeded control males by a difference of 0.43; experimental 
females exceeded control females by 0.24. Neither of these 
differences is significant. Experimental females were 0.11 ahead 
of experimental males, control females 0.08 ahead of control males. 



Conclusions for the data in Table XLVIII are the same as 
those for Table XLVII: after two years of college, students not 

receiving instruction in freshman composition performed as well 
on the theme as students who had received such instruction. 
Females, who were superior to males, not significantly on theme 
(0.38), significantly on objective tests, in September 1964, 
were negligibly ahead (experimentals 0.11, controls 0.08) in May 
1966. In September 1964 the mean theme ratings for females were 
superior (not significantly) to those of males — 0.38 for each 
subgroup. Two academic years later, the scores of the females and 
the males were even closer together. 
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Summary 



As the investigators do not believe the theme performances 
can be legitimately examined for gains, the summary below indicates 
the relative position of the subgroups of the 122 pairs who completed 
the full two academic years of the investigation on theme performance 
at each testing point. The following chart is based upon Tables XLIV 
through XLVIII. 



Beginning of First Semester End of First Semester 



Exp. 

Cont. 



SAME 



End of Second Semester 



Exp. 

Cont. 



EXPERIMENTAL HIGHER 



Cont . CONTROL HIGHER 



End of Fourth Semester 



Exp. 
Cont . 



EXPERIMENTAL HIGHER 



The difference in favor of the controls at the end of the first 
semester was great enough to be significant. On the May 1965 and May 
1966 testing occasions, the difference, in favor of the experimentals , 
was not significant. 



SUMMARY, CONCLUSIONS, AND OBSERVATIONS 



Overall Findings 



The present study tested the hypothesis that the writing 
performance of students enrolled in a college composition sequence 
is not significantly different from the writing performance of 
comparable students not enrolled in a college freshman composition 
course when the two subgroups have attended college for an equal 
length of time. The total sample was composed of a representative 
sample from each of five state universities. 

Students who received instruction excelled students who had 
not received instruction at the end of the first semester, and 
tended to surpass them at the end of the second semester; at the 
end of the fourth semester the two groups performed about the same. 
The data are summarized in the tabulation below. In this presenta- 
tion C signifies that the control subgroup, the students who 
received instruction, had the higher obtained mean. E signifies 
that the experimental subgroup, the students who did not receive 
instruction, had the higher obtained mean. C* signifies that the 
controls had a significant superiority, E* that the experimentals 
had a significant superiority. 
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January 1965 


May 1965 


May 1966 




Number of 


(end of 1st 


(end of 2nd 


(end of 4th 


Test 


Matched Pairs 


Semester) 


Semester) 


Semester 


COOP 


597 


c*„ 






CEEB 


597 


C 






Theme 


597 


c* 






COOP 


365 


c* 


C* 




CEEB 


365 


E 


C 




Theme 


365 


C 


C 




COOP 


122 


C 


C 


C 


CEEB 


122 


E 


c 


E 


Theme 


122 


C 


E 


E 


It 


is evident that 


at the end of the 


first semester 


the 597 



students who had received instruction in freshman composition per- 
formed significantly better than those who had not — on COOP and Theme 
performance. At the end of the second semester, the 365 control 
students performed significantly better than their experimental 
matches on COOP, but essentially the same on CEEB and theme. In May 
1966 the 122 control students and their experimental matches per- 
formed in essentially the same way. 

Experimental and control subgroups were compared 18 times. 

These facts concerning the performance of the subgroups may be 
summarized both from the point of view of the number of times a 
given subgroup excelled the other and from the point of view of the 
performance of the subgroups on each of the three test instruments. 
Summarizing first from the point of view of the number of times one 
subgroup excelled the other: 

1. In the 18 comparisons, 4 showed a significant difference 
between the subgroups, in all 4 the control subgroup excelled. 

2. In the 14 comparisons which did not reveal a significant dif- 
ference between the two subgroups, the higher obtained mean was 
achieved 9 times by the controls and 5 times by the experimentals . 

3. At no testing point was there a significant difference between 
the subgroups (122 pairs) who completed two years of college. 

The controls attained a superiority in observed mean 5 times, the 
experimental subgroup 4 times. 

Summarizing next in terms of the performance of the subgroups 
on the testing instruments employed: 






1* COOP - ..he control subgroup mean was significantly higher 
than the experimental subgroup mean in 3 comparisons and 
somewhat higher in 3 comparisons. 

2. CEEB -In none of the 6 comparisons was there a significant 

difference between’ the means of the two subgroups: the 

obtained means favored each subgroup 3 times. 

3. Theme The control subgroup mean was significantly higher 
once. On the other 5 occasions, the obtained mean favored 
the controls 3 times and the experiraentals twice. 

In terms of testing instruments, then, COOP yielded superiority for 
the controls; the theme, if it favored either subgroup, favored the 
controls; the' CEEB evidence suggested essential similarity between 
the control' subgroup and the experimental subgroup. In general, 

COOP denied the hypothesis, the theme leaned slightly toward denial, 
and CEEB neither confirmed nor denied the hypothesis. “*• 

Do college students who have had formal course work in freshman 
English composition perform better on tests related to writing than 
comparable students who have not had the formal course work? Evidence 
of performance on the tests used in this study has shown that the 
answer at the end of the first semester is "Yes," at the end of the 
second semester, a qualified "Yes," and at the end of the fourth 
semester, "No." The two subgroups of students who finished the two 
years appear to be substantially equal. 



The design of this study has combined the performance of students 
from several universities, each of which had a freshman program some- 
what different from each of the others. At each university, members of 
the control subgroup received their instruction from several different 
instructors. Each instructor interpreted the official syllabus of his 
institution in his own way. None of the evaluative instruments employed 
in this study was attuned to a particular instructor or a particular 
university. Thus such differences as appear between control and experi- 
mental subgroups reflect the common elements which are present independently 
of the unique' standards and qualities stressed in a particular program 
or class. Because the obtained differences are based on such common 
elements, they constitute only a partial basis for evaluating a program 
at one of the cooperating universities. 



Findings by Sex 



In both the Interim Report and the present study, the 
investigators have been interested in the relationship between 
sex of students and their performance on tests related to compo- 
sition. The investigators’ belief that there is such a 






relationship influenced the matching procedures, which employed 
sex as one of the criteria for matching students. Throughout the 
present report, except when numbers were so small that a division 
by sex would have led to confusion, performance has been reported 
both for total subgroups and for the male and female components 
of those subgroups. The present discussion summarizes the findings 
concerning performance by sex. 

Superiority of female performance . It is useful at the 
outset to examine the distribution of performance on a national 
test, as it was in terms of such a distribution that the samples 
were selected. Data are at hand showing the performance of a 
normative sample of 882,080 high school seniors, and of 703 fresh- 
man males and 1,075 freshman females enrolled at the University of 
Northern Iowa in the fall of 1966. As described on page 39, the 
selection of students for the current investigation was in terms 
of distributions of scores on American College Testing Program, a 
separate distribution for each sex. The following tabulation is 
an illustration of the differences among the three distributions 
of ACT scores. 










mm 



ACT English 
Standard Score 

33 

31 

30 

24 

21 

14 

11 



Percentile 
Rank National 
College-Bound 
High School 
Seniors 



99 

99 

81 

59 

13 

5 



Percentile 
Rank, UNI 
Freshman Males 
N=703 



99 

76 

43 

3 

0 



Percentile 
Rank, UNI 
Freshman Females 
N a l ,075 

99 

99 

98 

56 

22 

0 

0 




v 



Among the illustrative standard scores included are, for the females, 
the highest score (33), the lowest score (14), and the score half- 
way between (24), not necessarily the median. For the males, the 
highest standard score was 30, the lowest 11, and the score half-way 
between, 21. The superiority of the female is clear when one notes 
that the middle female score (24) has a percentile rank of 56 in the 
UNI female score distribution, and of 76 in the UNI male distribution. 
The middle score of the male range, 21, has a corresponding percentile 
rank of 43 in the UNI male distribution and 22 in the UNI female 
distribution. In short, the typical male performs less well on the 
ACT English test than does the typical female. 

The differences in performance of males and females illustrated 
in the tabulation above are reflected in the male-female performance 
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of the 1,040 matched pairs (422 males, 618 females), from the 
cooperating universities in September 1964, the outset of the 
study. The performance by sex, and the t-ratios for differences 
in mean performances, are shown below. 

Female 



Subgroup 


N 


Sex 


Variable 


Mean 


S. D. 


minus Male 


t-Ratio 


d.f . 


Exp. 


422 


M 


COOP 


159.12 


7.55 








Exp. 


618 


F 


COOP 


163.28 


7.32 


4.16 


8.877* 


1,038 


Cont . 


422 


M 


COOP 


159.07 


7.48 








Cont . 


618 


F 


COOP 


163.37 


7.19 


4.30 


9.307* 


1,038 


Exp. 


422 


M 


CEEB 


445.83 


85.41 








Exp. 


618 


F 


CEEB 


489.94 


81.99 


44.11 


8.368* 


1,038 


Cont . 


422 


M 


CEEB 


446.67 


85.01 








Cont. 


618 


F 


CEEB 


489.51 


81.52 


42.84 


8.170* 


1,038 


Exp. 


422 


M 


Theme 


8.37 


2. 26 








Exp. 


618 


F 


Theme 


9.42 


2.09 


1.05 


7.688* 


1,038 


Cont . 


422 


M 


Theme 


8.37 


2.26 








Cont. 


618 


F 


Theme 


9.42 


2.09 


1.05 


7.688* 


1,038 


*Signif leant 


at 0 


.01 level 


(two-tailed test). 







The superiority of females reflected in the above tabulation 
may at first glance seem at variance with conclusions reached by 
other investigators. In an attempt to probe the male-female per- 
formance in writing, the investigators examined the studies of Hunt 
(1965), Riling (1965), O'Donnell and Griffin (1967), and Loban (1966) 
All of these are studies of the development of syntactic control in 
children. The authors comment, directly or indirectly, on the dif- 
ferences in performance between males and females. Hunt indirectly, 
and O'Donnell and Griffin directly, indicate that though males are 
initially at a disadvantage, the "gap" closes as the students grow 

older. After indicating that "In writing girls in Grades 

3 and 5 appeared to be superior to boys" (p. 96), O'Donnell and 
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Griffin state that "in the seventh grade . 0 . the relative 
positions of the sexes were clearly reversed on the scales taken 
to indicate syntactic skill/' (p. 96). Hunt’s tabulations of 
T-units shorter than nine words and longer than twenty words 
(pp. 28 and 31) would support somewhat the same conclusion, as 
at the eighth and twelfth grade levels the boys write fewer T- 
units shorter than nine words than do the girls, and at the twelfth 
grade level they write more T-units longer than twenty words than 
do the girls. The number of words per T-unit is used by Hunt as an 
index of maturity in writing. 

Riling (p. 87) and Loban (p. 90), on the other hand, assert 
that the best boys do better than the best girls, but clearly 
imply that in general the boys do worse than the girls, and that 
the worst boys are worse than the worst girls. In an attempt to 
check on the latter point, the present investigators examined per- 
formance on COOP of 4,190 students for whom data were available in 
the present study (see discussion, Table VIII, page 60) „ The 
investigators stated ". . . that only at the top edge of the distri- 
bution (the top 2 percent) do males equal or excel females. At the 
bottom edge of the distribution the reverse is true; the male group 
falls below the females." (p. 62). This conclusion is supported by 
examination of the quarters-by-sex tables for the test instruments 
(Tables VII, p. 59; XI, p. 67; XIV, p. 73; XXIX, p. 99; XXXI, p. 102; 
XXXIII, p. 107. These tables show, both in the means for the males 
and in the means for the females, and in the proportion of each sex 
in the top and the bottom quarters, that the male distribution tends 
to the lower quarters, the female to the higher quarters. In short, 
the statements by Riling and Loban are consistent with the findings 
of this study — that while the very best writer, or performer on tests 
related to writing, may be a male, the mean of the males is lower 
than the mean of the females. 

It is important, also, to remember that the O’Donnell and 
Griffin, Hunt, Riling, and Loban studies concerned the syntactic 
virtuosity of the children studied. That is, the investigators 
inquired into the kinds of syntactic structures the students 
employed. It is on such measures that they estimate the linguistic 
maturity of their subjects. In the present study, such matters were 
hardly noticed. Neither the objective tests nor the theme evalu- 
ations involved analyses of sentence structure, T-units, or other 
syntactically defined linguistic entities. It is not possible to 
compare the two kinds of studies in any direct way. One may theorize 
that the readers of the compositions in the present study reacted 
in some degree to the syntactical resourcefulness displayed in the 
papers. However, the degree to which such resourcefulness, or par- 
ticular manifestations of such resourcefulness, influenced the 
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readers' decisions is, of course, unknown. For the most part, the 
readers were probably not aware of the syntactic elements in the 
papers. 



The relationship of male to female means presented in the 
various tables throughout this study is summarized in the fol- 
lowing tabular representation, in which F stands for a female mean 
not significantly greater than a male mean, an M a male mean not 
significantly greater than a female ...ean, and an F* or M* a sig- 
nificant difference. 



Test 


N Pairs 3 


M/F 


Sept . 
Exp. 


1964 
Cont . 


Jan. 

Exp. 


1965 
Cont . 


May 

Exp. 


1965 
Cont . 


May 
Exp . 


1966 

Cont 


COOP 


597 


470/724 


F* 


F* 


F* 


F* 










CEEB 


597 


470/724 


F 


F 


F 


F 










Theme 


597 


470/724 


F* 


F* 


F* 


F* 










COOP 


365 


268/462 


F* 


F* 


F* 


F* 


F* 


F* 






CEEB 


365 


268/462 


F* 


F* 


F* 


F* 


F* 


F* 






Theme 


365 


268/462 


F* 


F* 


F* 


F* 


F* 


F* 






COOP 


122 


88/156 


F* 


F* 


F 


F* 


F 


F* 


F* 


F 


CEEB 


122 


88/156 


F 


F 


F 


F* 


F 


F* 


F 


F* 


Theme 


122 


88/156 


F 


F 


F* 


F 


F 


F 


M 


F 



a The reader should remember that each succeeding N represents the 
persisting members of the preceding N. 



In the 54 comparisons presented, males out-performed females 
or. one occasion and the females were significantly superior on 35 — 
the males were never significantly superior. Thus females performed 
significantly better than males on nearly two— thirds of the testing 
occasions. The control females were significantly superior on 19 
occasions. There is therefore little doubt that in the population 
studied, the females were superior on tests related to writing 
ability. Twenty-six of the 30 comparisons made through May 1965 
show the females significantly above the males. In the group which 
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completed testing through January 1965 (N*597 pairs) , the females 
were significantly superior on all theme and COOP comparisons, 
superior by an amount short of significance on CEEB only. 

Gains by sex . The summary presented above Indicates that, 
whatever may be the situations at the extremes of test score 
distributions, the mean performance of females Is consistently 
superior' to that of males in the group which persists through the 
first year of college. Do the males ever "catch up"? The evidence 
to answer that question is apparently not in, but there is a 
suggestion In the performance of the 88 males and 156 females who 
completed testing through the second year of college. Though their 
performance does not disclose male superiority, there Is a slight 
Indication that female superiority is decreasing, as the experi- 
mental males on the May 1966 theme attained a mean somewhat , but 
not significantly, greater than the female mean. Of the 24 com- 
parisons presented for these 244 students, 9 show the females 
significantly superior, none shows the males significantly superior. 
Whether this trend would continue, and whether it indicates a 
leveling of performance or the persistence of the better male pairs 
would be difficult to say. Another possibility might be a difference 
in males and females in response to the test situation. In any 
event, one may firmly conclude by reiterating a statement made in 
the Interim Report "... that in investigations concerning compe- 
tence in composition the ratio between sexes must be taken into 
account in the groups whose performance is being studied." (p. 65). 

Thus far all comparisons of male and female performance 
reflect the relationship between the sexes on specific testing 
occasions. They may be summarized by saying that the mean of the 
females was superior at the beginning of the freshman year and 
remained so during the freshman year. A different question is 
whether one sex appears to benefit from instruction more than the 
other. The tabulation below presents a summary of the gains on 
the objective tests for the 365 pairs completing the first full 
year of the study. Asterisks indicate significant mean gains. 

The difference between male and female performance on themes is 
included only for completeness (the reason for this has been dis- 
cussed on page 42). In the theme portion of the tabulation, 
asterisks represent significant differences between male and female 
means on theme performance as of the testing date. 
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Test 


Subgroup 


Sex 


N 


Sept. -Jan. Mean Gain Jan 


.-May Mean 


Gain 


COOP 


Exp. 


M 


134 


2.26* 


1.87* 




COOP 


Exp. 


F 


231 


1.71* 


2.06* 




COOP 


Cont . 


M 


134 


2 . 68 * 


1.60* 




COOP 


Cont . 


F 


231 


3.14* 


1.49* 




CEEB 


Exp. 


M 


134 


26.97* 


7.01 




CEEB 


Exp. 


F 


231 


31.98* 


0.13 




CEEB 


Cont . 


M 


134 


20.52* 


13.40* 




CEEB 


Cont . 


F 


231 


33.48* 


12.36* 












Diff. in Mean 




Diff. In Mean 










Jan. Theme (F-M) 


May Theme 


(F-M) 


Theme 


Exp. 


M 


134 


9.11 


9.30 




Theme 


Exp. 


F 


231 


IO .44 1.33* (F) 


9.82 


0.52* (F) 


Theme 


Cont . 


M 


134 


9.57 


9.39 




Theme 


Cont . 


F 


231 


10.58 1-01* (F) 


9.97 


0.58* (F) 




On COOP both sexes in both subgroups made 


significant mean 



gains both semesters. During the first semester the experimental 
males gained slightly more than the experimental females; during the 
second semester the experimental females gained slightly more than 
the experimental males. Among the controls , the situation was 
reversed. 



On CEEB significant mean gains were achieved by the 
control males and females both semesters , but by the experimental 
males and females only the first semester. As with COOP, there was 
inconclusive evidence concerning the possible superiority of one sex 
over the other in gains. 



Theme performance . On theme performance, the females 
scored means significantly greater than those of the males both 
in January and in May of the freshman year. Following are 
summary statements concerning sex and performance on tests related 
to competence in written composition: 



1 . 



Though the best performance may be by a male, the 
mean of female performances is consistently higher, 
frequently significantly higher, than the mean for 
males. 
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This being true, in investigations concerning student 
performance on tests related to composition the percentage 
of males to females must be taken into account. 



The disparity of performance between the sexes in tests 
assessing ability in composition persists at least through 
the sophomore year of college, with the possible excep- 
tion of actual writing performance. 



Performance by ability quarters , by sex . It is informative 
to examine performance by ability quarters, and within quarters, 
by sex. Performance on each of the three testing instruments is 
first discussed separately, followed by a summary statement. 



COOP - For the 597 matched pairs of students who completed 
the first semester, analysis of the January 1965 COOP 
scores for the two subgroups showed that the control mean 
was significantly higher than the experimental mean. An 
analysis by quarters of ability showed that at all four 
ability levels the control mean surpassed the experimental 
mean, significantly so at the second highest level. When 
the means for males and females within treatments were 
compared at each ability level, it was found that in the 
highest and lowest levels the female mean exceeded the 
male mean. The COOP data by ability level did not reveal 
any instances of superiority of experimentals over controls 
or of males over females. 



CEEB - On CEEB, the evidence for the 597 matched pairs 
showed similar January means for the experimental sub- 
group and the control subgroup in terms of the score 
scale involved. The noticeable exception to the overall 
evidence was in the lowest quarter, in which the control 
mean was significantly higher than the experimental mean. 
The analysis by sex by ability quarters showed that the 
control females surpassed the control males significantly 
in the top quarter, and the experimental females sur- 
passed the experimental males significantly in the lowest 
quarter. In each of the two middle quarters, the females 
and males were about equal. 



Theme - On the theme, a significant mean difference 
favoring the controls was found at the end of the first 
semester in the comparison for the complete subgroups and 
in the comparison for the lowest quarter. In the top 
quarter and the third quarter, the subgroup means were 
basically the same. The general finding concerning males 
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and females was that the females definitely excelled in 
the lowest two quarters, more so within the experimental 
subgroup than within the control subgroup. 

Summary - If one looks at the analysis by sex and by 
ability levels of the evidence on the three testing instru- 
ments, the lowest quarter in ability seems to be unique. 

Here is to be found the most frequently recurring indication 
that control subgroup means are higher than experimental 

subgroup means and that female means are higher than male 
means. 



This summarization has been based upon data through the 
first semester , It is for this sample of 597 matched pairs, 
the largest sample available for this purpose, that it is 

most defensible to utilize the finer analyses by ability 
and sex . 



Number of Students 

The amount of difference between treatments in treatments 
studies in college freshman composition is almost certain to be 
small. Students are in their twenty-fifth and twenty-sixth semes- 
ters of instruction in composition, having begun learning to write 
in the first grade. It is characteristic in situations in which 
instruction has been carried on over an extended period of time 
that the greatest change occurs early in the instruction, with the 
curve flattening out as instruction continues. With composition, 
rapid increases occur at the fifth and the seventh grade (cf. 

0 Donnell, p. 90). Though the O'Donnell study does not extend 
beyond the seventh grade, data in Hunt (p. 37) suggest that the 
mean improvement each year from the fourth to the twelfth grade is 
about 5 percent. Braddock (p. 7) gives the same figure. 



The test for statistical significance is an estimate of the 
probability that the difference which is observed is so small as 
to be attributable to chance, or so large as to be attributable to 
instruction, or instruction plus maturation. Significance at the 
5 percent level says that the difference observed would be likely 
to occur by chance only five times in one hundred if the given 
research were replicated. In working with probabilities, the 
chance that a given mean difference x*ill be significant is greater 
for a large sample than for a small one. As may be seen from the 
discussion in this report (see discussion of first COOP table, p. 54 } 
the chances of attaining a significant difference increase dramati- 
cally with an increase in the size of the experimental sample. 
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Matched Pairs 





Thus when an Investigator anticipates a small difference 
at best, a large sample is almost mandatory. In the opinion of 
the investigators, at least 100 pairs should be Involved. It was 
partly to attain a larger sample that the investigators included 
other universities in the present study. 



Particularly if an investigator is to follow college students 
beyond a single semester , the matched pairs design has the advantage 
of assuring continued comparability of the experimental and control 
subgroups. Students withdraw from college for many different 
reasons. The greater the number of academic terms (quarters or 
semesters) over which an experiment runs, the greater the number 
of students who will leave. The matched pairs design assures that 
such departures will not result in disparate subgroups. Particularly 
in regard to the importance of male-female differences in composition 
performance (see pp. 167-168) differing proportions of men and women 
in the subgroups studied could easily distort results. 



Another advantage of the matched pairs design (if students 
are matched exactly on one of the evaluative Instruments) is that it 
permits a check on some of the computer programs developed to produce 
the statistical summaries and analyses. For example, in the present 
study an error on one of the computer runs occurred in the calculation 
of the t— ratios, used to test the significance of the difference 
between the experimental and control subgroup means. Had the pairs 
not been matched exactly on theme score, the error might have gone 
unnoticed. Similarly, the correlation coefficients between scores 
earned on two administrations of one of the objective tests were 
suspiciously low. Inspection of the data revealed that some of the 
scores had been aligned erroneously— —Student A’s score was assigned 
to Student B and this had produced the low correlation coefficient. 
The ease with which such Internal checks may be made certainly 
recommends the matched pairs design. Both of these checks on the 
accuracy of the data would have been difficult or impossible in a 
covariance design; both were easily made with the matched pairs 
design. 



A further characteristic of the matched pairs design is that 
it permits the calculation of correlations between the subgroups. 
These across-subgroup correlations are, of course, essential for 
some of the analyses. 



Thus, though the matched pairs design Increases the rate of 
attrition, the investigators feel that the increased ease in making 
comparisons, in maintaining the similarity of the groups on the 
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matching characteristics, and in detecting errors more than 
compensates for the difficulties » A more extensive presentation 
of the considerations which led to the employment of the matched 
pairs design is in Appendix B. 



Themes as Tests 



Though themes are properly used as evaluative instruments 
in research in composition, their use creates problems® Braddock 
(1963, pp. 6-15) discusses these problems at length and makes 
recommendations concerning ways of attempting to meliorate them. 

The present investigators, now completing their third treatments 
study, also have recommendations. 

The central problem in using themes as evaluative instru- 
ments is that they do not lend themselves either to useful 
quantification or to consistent evaluation. To quantify demerits — 
for misspelling, for poor reference, for anemic development— will 
provide numbers which may then be manipulated, thus giving an 
impression of certainty and objectivity. But it is an illusion. 

The fact is that themes are primarily aesthetic objects, and 
judgments concerning them are aesthetic judgments. Each theme 
is unique; each judgment is to a considerable degree an expression 
of personal preference. Though themes may nonetheless be used in 
treatments research, the uniqueness of each theme and the large 
degree of subjectivity in the rating, force great care in evaluating 
procedures. In terms of these and other considerations, we make the 
following recommendations: 

1. Students, or groups, should be matched exactly on sex. 

The present study demonstrates that among college fresh- 
men, females as a group score higher on both objective 
and subjective tests of writing than do males. Failure 
to match on sex may easily lead to erroneous conclusions. 



2. Since themes are unique aesthetic objects, they are 

influenced to some degree by the conditions under which 
they are produced. Performances on different testing 
occasions — such as at the beginning and at the end of a 
semester — should not be compared. Gain (change) scores 
are likely to be more misleading than enlightening. Rather, 
the students should be matched exactly on the evaluation of 
their initial performance, and compared on the second 
performance . 

Thus the second recommendation is that subgroups be matched 
on theme performance at the beginning of the study, and that 
the effectiveness of the treatment be assessed in terms of 
performance at the conclusion of the study. 
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3. The third recommendation is that only one topic be used 
for each test theme. Though it is true that some students 
will not do their best on the topic, it is also true that 
others will. The only way to provide more than one per- 
formance is to provide a different topic, and since the 
investigators believe each topic is unique, it seems better 
to assume that , as^ <a group , the performance on one theme 
would be the same as the performance, as a group , on a 
second theme.* A corollary of this recommendation is that 
readers should evaluate only one topic at a time. 




4. In treatments studies within a university, re ders should 
follow an analytic procedure based upon the aspects or 
elements of composition stressed in the course. The deter- 
mination of the efficacy of each treatment in producing the 
desired writing behavior is the goal of treatments research 
in composition. "The desired writing behavior" needs to be 
clearly defined and understood by the evaluators. 




5. Testing conditions should be the same for both the experi- 
mental and the control subgroups. Not similar; the same. 
The best arrangement is for all participants in the study 
to write at the same time in the same room. 

6. In a general investigation, such as the present one, in 
which the question is whether two samples of students drawn 
from five universities with five different composition 
programs perform differently when one sample has received 
instruction and the other has not , the theme evaluation 
must be general — wholistic — rather than analytical. 
(Recommendation 4 refers to a specific course in a single 
institution, in which different procedures for attaining 
the same specific goals are being investigated.) 



*Braddock advocates the use of two themes on each testing 
occasion, the better performance for each student to be used in the j 

comparison, and a choice of topics. When the University of Iowa 
accepted the invitation to become a part of this study, Dr. Braddock 
requested and was granted permission to apply to the USOE for a j 

separate grant. The grant was forthcoming. Using themes written for ! 

the present study in September 1964 and May 1966, together with ] 

separate themes written for him, by the same students, on each of I 

those dates, Braddock made a comparison using the better theme by each 
student on each date. The study, Evaluation of College-Level Instruc- 
tion in Freshman Composition : Part II , is complete and may be obtained 

by writing to Richard Braddock, Rhetoric Program, University of Iowa. 
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RECOMMENDATIONS 



i 

A characteristic of our research which was both a strength 
and a complication was the presence of multiple criteria. Three 
tests were employed, and these were administered at three different 
junctures in the students' first two college years. Furthermore, 
data were analyzed for both a constant N over the three testing 
occasions and for the maximum N on each testing occasion. The 
results, therefore, are not represented by a single, quantitative 
index. Instead, there are 18 sub-comparisons. The findings are 
not consistent among these. An inevitable characteristic of 
longitudinal research is some attrition of sample members. It was 
beyond the scope of the investigation to study directly the 
participants who dropped out at successive stages. Information such 
as overall scholastic averages, majors, and grades in specific 
courses might or might not have been useful in harmonizing the 
findings in the 18 comparisons. 



With full realization of the complexities and the difficulty 
of arriving at a definitive interpretation of the evidence, the 
investigators offer some rather definite recommendations. 



1. The investigators do not recommend the elimination of freshman 
English composition at this time. 

Data from this study suggest that required freshman 
composition as it was taught in the participating state 
universities during the period of this study had a 
definite effect on performance of the students tested 
at the end of the first semester, a less definite effect 
at the end of the second semester, and no effect at the 
end of the fourth semester. Because there was some 
evidence of superiority favoring those with composition 
instruction at two testing periods, the investigators 
do not recommend the elimination of freshman composition. 

2. The investigators recommend that if the course is continued as 
a requirement, innovative practices be tried and their value 
assessed. 

The data do not strongly support the types of composition 
programs studied in this report; the investigators 
recommend further studies exploring the results of 
instruction centered on the new rhetorics, the new grammars, 
the production of films as stimulants to writing, small 
group instruction, individual instruction, speaking as a 
base for writing, and similar techniques which have been 
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3. 



5. 



developed since the inception of the present study, 
to see if such approaches might be effective „ 



Course objectives in freshman composition should be stated 
in the most specific terms possible. 



"Improvement in writing" is a vague goal to set for the 
freshman course , It: is particularly vague in view of 
the fact that the amount of improvement which may be 
expected, in the twenty-fifth and twenty-sixth semesters 
of the students’ exposure to some kind of instruction in 
writing, is very small. In such a situation, one must 
specify "improvement:" very carefully^ 



4. The present study contains basic information on test per- 



formance for a group of students who had proceeded through 



one, two, and four semesters of college without direct 
instruction in freshman composition. These data constitute 
a bench mark against which the performance of other groups 
ran be compared t The investigators recommend such use. 



The investigators recommend that institutional norms and 
national norms for tests designed to measure performance in 
writing be set up for males and females separately 0 Results 
of research which does not separate male and female per- 
formance should be interpreted with care. 
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Theme Topics and Instructions 




APPENDIX A 



Theme Topics and Instructions 



The principles followed in selecting topics, the use of a 
single topic on each test administration, and the equivalence 
of topics actually employed need to be discussed briefly. 

Three criteria were established in selecting topics for 
the theme tests: the topic must be of a middle level of 

abstraction, it must be related to the students' experience, and 
it must call for an individual rather than a stock response. A 
middle level of abstraction avoided favoring either the students 
who were skillful in exploring general principles or the 
students who happened to ''have special knowledge related to a 
specific topic. A topic related to the students’ experience and 
knowledge allowed them to support and illustrate their general 
statements with particulars readily available to them. A topic 
calling for an individual rather than a stock response provided 
a test of the students' ability to establish and support an 
original thesis. 

The use of a single topic rather than a choice among several 
topics on each testing occasion avoided the introduction of an 
additional variable whose Influence would be difficult to estimate. 
Such a restriction seemed justified by the fact that the 
students' performance as individuals was not under investigation. 
There is no reason to believe that if the students had had a 
choice of topics, comparison of their group performance would 
have been different from that resulting from a single topic. 

Equivalence of topics across resting occasions was not 
vital, as students' change scores on theme performance were not 
considered in the conclusions in this study. Though it was 
hoped that the topics used would be comparable to one another, 
any lack of similarity which may be present cannot be used 
meaningfully in speculation about the results achieved. The 
subgroups were compared with one another on their performance 
at each testing occasion. Changes from occasion to occasion 
within subgroups were not investigated. 
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On the following pages are the instructions and theme 
topics for the various testing sessions. The complete 
instruction sheets, with places for the readers' ratings, the 
name and number of the student, and the like have not been 
reproduced as these details are irrelevant and reproduction 
difficult. It should be noted, however, that the original 
instruction sheets were so arranged that the graders could 
learn neither the student's name nor the date on which the 
paper was written, and the second reader could not see the 
| rating given the paper by the first reader,, 
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(Theme Instructions for September 1964) 



THEME INSTRUCTIONS 



1. The paper which you are about to write will be judged on your 
success in presenting your thoughts in a clear, unified, well- 
organized manner, observing the conventions of standard written 
English. You should think about the topic until you have deter- 
mined what idea you want to convey to the reader and the general 
procedure you will follow in doing so. Then you may write your 
paper. Do not hesitate to make a brief outline if you desire to 
d£ S£ (use the back of this sheet). An outline is not required. 

2. You should write as neatly and legibly as you can, but you should 
not hesitate to make changes between the lines if you believe 
them to be necessary. You do not have to copy the paper over. 

3. WRITE ON ONE SIDE OF THE PAPER ONLY. If you need more paper, 
ask for it. 

4. Begin on the third line of the first sheet, and WRITE ON EVERY 
LINE THEREAFTER. 

5. You must write with INK or BALL-POINT PEN. 

6. Be certain to write your STUDENT NUMBER in each of the blanks 
(two at the top, one at the bottom) provided for it on this 
sheet, and in the upper right-hand corner of each page of 
your theme. 

7. Turn in all of the paper given to you. 

8. You must stay at least one hour and fifteen minutes. 

9. LENGTH: 300 - 500 words. 



Today a young man who wears a beard or a girl who prefers slacks to 
skirts has difficulty in finding employment in most work which serves 
the public. Changes in fashion are announced one day and adopted 
the next. In business, promotions are made with great emphasis upon 
how well an individual meets the "image" the employer wishes to 
create. In school, those who do as they are told and give the answers 
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expected of them are rated high by many of the faculty; those who 
do what "everyone else" does are popular with the students. 

Now consider a famous quotation: "Whoso would be a man must be a 

non-conformist . " 

Relate the material In the opening paragraph to the quotation, 
Indicating whether, on the basis of your observation and experience, 
you feel the Idea expressed In the quotation Is true. 
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(Theme Instructions for January 1965) 



THEME INSTRUCTIONS 



1 . 



The paper which you are about to write will be judged on your 
success in presenting your thoughts in a clear, unified, well- 
organized manner, observing the conventions of standard written 
English. You should think about the topic until you have deter- 
mined what idea you want to convey to the reader and the general 
procedure you will follow in doing so. Then you may write your 
paper. Do not hesitate to make a brief outline if you desire 
t£ d£ s£ (use the back of this sheet). An outline is not required 



2 . 



You should write as neatly and legibly as you can, but you should 
not hesitate to make changes between the lines if you believe 
them to be necessary. You do not have to copy the paper over. 



WRITE ON ONE SIDE OF THE PAPER ONLY, 
ask for it. 



Turn in all of the paper given to you. 



LENGTH: 300 - 500 words. 




If you need more paper, 



Begin on the third line of the first sheet, and WRITE ON EVERY 
LINE THEREAFTER. 



You must write with INK or BALL-POINT PEN. 



Be certain to write your STUDENT NUMBER in the blank provided 
at the top of this instruction sheet in the upper left-hand 
corner under the Total Score box. It should also be written 
on each page of your theme . Djo NOT write your name , or the 
name of your school , in any place other than the blank provided 
at the bottom of this sheet. 



You must stay at least one hour and fifteen minutes 



TOPIC 



In the United States, popular entertainment reflects the ideals 
of the great middle class of people. For example, we seldom see or 
read of a young couple struggling to make ends meet, of psychological 
problems that cannot be resolved, of the blood that accompanies 
violent death, of the horrors of war, or of the wearing routine of 
life day in and day out. On the contrary, no problem is too complex 
for solution, no disaster occurs to the Good, no reward to the Bad. 
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It is hardly too much to say that most young people in the United 
States form their expectation of their lives as adults from the 
distorted image presented by television, movies, and books rather 
than from their observations of the lives of the adults about 
them. 

Reflect upon these statements and determine whether you agree 
or disagree with them or feel that they should be modified in some 
way. Then write a paper indicating the manner in which your 

experience and knowledge have led you to the conclusion you have 
reached . 
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(Theme Instructions for May 1965) 



THEME INSTRUCTIONS 

1. The paper which you are about to write will be judged on your 

success in presenting your thoughts in a clear, unified, well- 
organized manner, observing the conventions of standard written 
English. You should think about the topic until you have deter- 
mined what idea you want to convey to the reader and the general 
procedure you will follow in doing so. Then you may write your 
paper. Do not hesitate to make a brief outline if you desire to 

fto (use the back of this sheet). An outline is not required. 

2. You should write as neatly and legibly as you can, but you 

should not hesitate to make changes between the lines if you 
believe them to be necessary. You do not have to copy the paper 
over . 

3. WRITE ON ONE SIDE OF THE PAPER ONLY. If you need more paper, 
ask for it. 

4. Begin on the third line of the first sheet, and WRITE ON EVERY 
LINE THEREAFTER. 

5. You must write with INK or BALL-POINT PEN. 

6. Be certain to write your STUDENT NUMBER in the blank provided 
at the top of this instruction sheet in the upper left-hand 
corner under the Total Score box. It should also be written on 
each page of your theme. Dt> NOT write your name , or the name 

of your school, in any place other than the blank provided at 
the bottom of this sheet . 

7. Turn in all of the paper given to you. 

8. You must stay at least one hour and fifteen minutes. 

9. LENGTH: 300 - 500 words. 



TOPIC 



As society becomes increasingly complex, the number of people upon 
whom we are dependent increases. Daniel Boone killed a bear and 
ate it. When we buy steak, we purchase the services of the person 
who produced the animal, the person who fattened it, the person who 
took it to market, the packing company which bought it, slaughtered 
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it, and dressed it, the trucker who transported it to the store 
from which we bought it, and, of course, the grocer himself. Each 
person must do his part if we are to have the steak. Even this 
picture is greatly over-simplified. There are, for example, the 
gasoline which fueled the truck and the truck itself. Considering 
the interdependence illustrated by the story of the steak, how free 
are we to guide our own lives? Are we liberated from stalking, 
killing, skinning, and cleaning our dinner, or are we robbed of 
our independence? Can we say, as Henley did, "I am the master of 
my fate, /I am the captain of my soul"? Does modern technology 
liberate us or dominate us? Present your opinion, based upon your 
knowledge, observation, and experience. 
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(Theme Instructions for May 1966) 



THEME INSTRUCTIONS 

1. The paper which you are about to write will be judged on your 
success in presenting your thoughts in a clear, unified, well- 
organized manner, observing the conventions of standard written 
English. You should think about the topic until you have deter- 
mined what idea you want to convey to the reader and the general 
procedure you will follow in doing so. Then you may write your 
paper. Do not hesitate to make a brief outline if you desire to 
d <3 so^ (use the back of this sheet). An outline is not required. 

2. You should write as neatly and legibly as you can, but you should 
not hesitate to make changes between the lines if you believe them 
to be necessary. You do not have to copy the paper over. 

3. WRITE ON ONE SIDE OF THE PAPER ONLY. If you need more paper, 
ask for it. 

4. Begin on the third line of the first sheet, and WRITE ON EVERY 
LINE THEREAFTER. 

5. You must write with INK or BALL-POINT PEN. 

6. Be certain to write your STUDENT NUMBER in the blank provided 
at the top of this instruction sheet in the upper left-hand 
corner under the Total Score box. It should also be written 
on each page of your theme. Do^ NOT write your name , or the 
name of your school , in any place other than the blank provided 
at the bottom of this sheet . 

7. Turn in all of the paper given to you. 

8. You must stay at least one hour and fifteen minutes. 

9. LENGTH: 300 - 500 words. 



TOPIC 



Conventional is a word frequently used to refer to customary 
attitudes, beliefs or actions. In the United States it is a 
convention for men to be clean-shaven, women to wear a certain 
amount of make-up, boys to be interested in sports, and girls to 
be interested in becoming wives and mothers. A person who is 
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unconventional in some way departs from the conventions of action 
or belief of the society of which he is a part. 

With this explanation in mind, discuss the following statement: 

"Convention is society's safeguard, but also its potential 
executioner." To what extent and in what ways do you agree with 
this statement? Use examples and details from your knowledge and 
experience to support your conclusion. 
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APPENDIX B 



Choice of Experimental Design 



APPENDIX B 



Choice of Experimental Design 



In planning research, some of the most complex questions 
are concerned with the choice of experimental design. The 
questions are both theoretical and functional. These two kinds 
of consideration come together when one finally must decide the 
best way, under the circumstances in which a given study will 
be made, to collect and analyze data for meaningful samples of 
students. In the present study, three circumstances dictated the 
choice of a matched pairs design. 

The first circumstance was the college administration's 
stipulation that students who were to receive the experimental 
treatment be informed of the fact prior to their registration. 

It seemed essential that such students, their parents, and the 
faculty advisors be given advance information about the purpose 
of the research and its impact on them. These experimental 
students would not receive instruction in freshman composition— 
a major departure from normal college experience. Given the 
faith of students in the importance of composition, to have 
denied them enrollment on registration day without prior warning 
could have induced anxiety and resentment, possibly producing a 
kind of "reverse" Hawthorne effect. Added to this would have 
been confusion in registration, irritation among advisors, and 
concern among parents. 



Thus the investigators were compelled to select, in advance 
of September registration, the students who would receive the 
experimental treatment. As described on page 38, this procedure 
involved selecting a pool of students from those who, by approxi- 
mately July 1, 1964, had met admission requirements and expressed 
their intention to enroll in the given institution. There was, 
of course, no assurance that all of the selected pool would actually 
enroll. This pool, which was a random sample from the July list, 
would not be a random sample of the September freshman class. That 
is, some entering freshman students had no opportunity to be 
included, and some who were included in the July group did not enroll. 

A second circumstance was the duration of the investi- 
gation. The experimental design called for the students to be 



^•Jewell, Ross M. and Gordon J. Rhum, The Relative Effective- 
i } . ? - s . s j^ Tw° Methods of_ Instruction in College Fresh man Composition: 
Closed-Circuit Television and ' Normal ' Classroom . Cedar Falls, Iowa: 
State College of Iowa, February, 1966, p. 48. 
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tested through the end of their sophomore year. That relatively 
heavy attrition would occur was certain;* that it x^ould have an 
equal effect on both treatment groups seemed unlikely. Among 
other considerations, the control students would be enrolled in 
a course which frequently causes students trouble, while the 
experimental students would not. In any event, the possibility 
that attrition would occur in such a way that the two treatment 
groups would become progressively dissimilar could not be ignored. 

Related to the attrition problem was the importance of 
maintaining the same ratio of males to females in both of the 
subgroups. The investigators believed, and their belief is sup- 
ported by data subsequently examined (see page 140), that females 
would perform somewhat better than males on measures of compo- 
sition ability. Should the ratio between sexes in one group 
become substantially different from the ratio in the other group, 
the likelihood of distorted results would be present. 

A third circumstance was the audience which would read 
the research. As the investigation concerns the effectiveness 
of a course usually taught in departments of English, members 
of English departments would be the group for whom the report was 
primarily intended. It seems fair to say that such an audience 
would have considerable difficulty In following the intricacies 
of analysis of covariance. Though this consideration may at 
first seem somewhat frivolous, its pertinence to the potential 
impact of the project is nonetheless real. 

In the light of these circumstances, the investigators 
became convinced that the matched-pairs design should be employed. 
Matching after September registration insured a list of students 
who were actually enrolled. Use of the matched pairs design 
with sex as one criterion made certain that the ratio between 
males and females would be the same for both subgroups not only 
at the beginning, but at any subsequent point in the investigation. 
Use of matched pairs minimized the possibility that in the 
attrition which would occur over the life of the experiment some 
factor would operate unequally to reduce the similarity of the 
subgroups. Finally, use of matched pairs enabled the investi- 
gators to present results in a manner which would make them 
readily available to members of English departments and directors 
of freshman composition. 



*The Registrar of the University of Northern Iowa 
estimates that the attrition for a freshman class is on the 
order of 19 percent, and the attrition has reached approximately 
40 percent by the end of the sophomore year. 
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The investigators could, of course, have set up the sub- 
groups from among the students whose data were available in July, 
taking first a random sample of the total group, pairing them, 
and then for each matched pair of students randomly assigning 
one member of the pair to the experimental treatment and the other 
member of the pair to the control treatment. However, in July the 
only pertinent test data available for the students was their per- 
formance on ACT English. As the investigators wished to match as 
closely as possible, they decided to wait until more tests could 
be administered during the fall semester orientation period. Doing 
so permitted matching as reported on page 39, by age, sex, theme per- 
formance, and a score derived from performance on the CEEB and COOP. 
This precision in matching provided increased confidence in the 
similarity between the two treatment groups. Closeness in matching 
was also facilitated by the fact that the supply of subjects was 
greater in September than it was in July. 

Three additional points. Since there were only two treat- 
ment groups, the matched pairs approach was more feasible than if 
there had been several treatment groups. Secondly, the investi- 
gators did not have to use, indeed did not wish to use, intact 
classroom groups for the control treatment. Finally, in methods 
experiments generally, random samples of a real population are not 
attainable. Near-randomness is achieved only in the beginning 
stages, and not in the groups which actually complete the experi- 
mental period. 
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APPENDIX C 



Procedure for Evaluating Themes 



Prior to each reading session, Mr* Jewell would send to 
Mr. Godshalk about forty themes, selected at random. From this 
sample Mr. Godshalk would determine the general nature of the 
total set of themes. He would choose a number of themes that in 
his judgment were typical of range and treatment, and Mr. Jewell 
would have these duplicated. These became the sample themes used 
during the reading as practice themes, 

Mr. Godshalk’s main responsibility when the raters (the 
smallest number was nine) had assembled was to communicate to 
them the criteria for evaluating the papers. First, he would 
have Mr. Cowley and Mr, Jewell describe the purpose of the 
investigation, the circumstances under which the papers had been 
written, and the students who had written them. He would then 
explain the rating scale. When all questions concerning its 
application had been answered, he would distribute several sample 
themes to be rated. After he had made a tally of the various 
values assigned to these papers, he would allow individuals to 
explain their ratings or to question his rating. If a rater 
seemed to be over-reacting to something in the papers, something 
which Mr. Godshalk believed from examination of the sample papers 
was typical, he would so inform the readers and caution them 
against misinterpreting particular aspects of the papers. 

Before setting the readers to work 'in earnest, he would 
remind them that since they were experienced readers their first 
judgment of a theme as a whole was probably as valid as any sub- 
sequent judgment they might make of the same paper. Therefore, 
they were not to pause and consider but were to read and respond. 

As the rating session progressed, Mr. Godshalk would note whether 
any particular rater seemed to judge consistently in a way 
different from the other raters. At relatively frequent inter- 
vals, he would interrupt the reading to allow the readers to relax 
and would read aloud papers which had been passed on to him by 
individual readers. Frequently, these papers posed special 
problems which Mr. Godshalk would have the group discuss, always 
making clear his own judgment. The goal of the initial orientation 
and of the subsequent breaks in the reading was for Mr. Godshalk to 
convey to the readers his criteria and co get them to standardize 
their scoring so that they would agree in their ratings. The 
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reading would be most "perfect" when all of the readers rated all 
the papers in the same way that Mr. Godshalk would rate them. In 
practice his standards would be slightly altered if a consensus 
indicated they should be. Thus, the validity of the evaluation 
could be no greater than the validity of Mr. Godshalk' s criteria 
as modified on occasion by discussion with the readers. 
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PERFORMANCE IN INDIVIDUAL UNIVERSITIES 



| The project was designed to yield information regarding 

instruction in college freshman composition at state-supported 
* universities offering varied freshman composition programs. In 

the light of the broad purpose of the study, comparison of 
performance among individual universities was neither a primary 
f nor a second ary objective. That is, no attempt has been made to 

assess the apparent effectiveness or lack of effectiveness of the 
programs at the individual universities. Rather, the focus has 
been on the total group of students. 

The investigators agreed that each university should 
| receive a report of the results relating to its own students. 

Presented on pages 182 to 190 are summary tables of some of the 
basic facts of the performance at individual universities. Inter- 
pretation of the evidence in these tables must be tentative, 
primarily because the samples are so small. Also because of small * 

samples, no summary of May 1965-May 1966 performance is included. 

; Below are summary statements based on Tables D-I through 

D-IX. In preparing the following summary statements the investi- 
| gators identified only what seemed to be the most prominent 

departures from the composite picture for the participating uni- 
I versities. 6 



First Semester (September 1964— January 1965) 

COOP. (September to January gains) 

1. Gains for university 1 were in general greater than 
those for any other university. 

2. For university 5, gains made by the controls, both 
males and females, were relatively low. 

CEEB. (September to January gains) 

1. The gains at university 1 were in general greater 
than those for the combined universities. 

2. In universities 2 and 3, the mean gains were generally 
below those for the combined universities. 

3. University 5 is special, as it was the only one at 
w h*- c h the exper imentals ended the semester higher 
than the controls. 
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Theme . (January 1965 means only, no gain scores for 
themes analyzed) 



1. In university 1, the control minus experimental 
mean difference was greater than it was for the 
combined universities. 

2. In university 2, the exper imentals performed 
somewhat better than the controls. 

3. In university 3 the mean theme scores were lower 
than the mean for the combined universities. 

This was especially true of the males. 

Second Semester (January 1965-May 1965) 

COOT » (January to May gains) 

1. University 5 had the greatest gains. 

2. University 2 had the smallest gains. 

3. In university 1 the control minus experimental 
value was greatest. 

A. Among the males, the mean gains in university 1 were 
smallest . 

CEEB . (January to May gains) 

1. In university 1 the control minus experimental 
difference was relatively large. 

2. In university 5 at the end of the freshman year, the 
mean for exper imentals was greater than the mean for 
controls . 

3. Within universities there was fluctuation in mean 
gains between experimentals and controls, males and 
females . 

Theme . (May means only, no gain scores for themes analyzed 

1. University 2 had mean theme scores higher than 
those for the combined universities. 
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2. University 3 had mean theme scores lower than those 
for the combined universities. 

3. Difference in theme score in favor of controls was 
greatest at university 3. 



First Year (September 1964-May 1965) 

COOP . (September to May gains) 

1. The greatest gain was made by the experimental males 
at university 5. 

2. The greatest control minus experimental difference 

at the end of the second semester was at university 1. 

CEEB . (September to May gains) 

1. The highest gains were made by the female controls 
at university 1. 

2. The lowest gains were made by female experimentals at 
university 2. 

3. The greatest control minus experimental mean difference 
in May 1965 was at university 1. 

Theme . Summary above under Second Semester. 
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^Significant at 0.05 level (two-tailed test). 
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university 4 were limited and incomplete and therefore not included in the totals. 
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THE PERFORMANCE OF 365 MATCHED PAIRS OF STUDENTS ON THE COLLEGE ENTRANCE EXAMINATION BOARD 
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Controls 4*58 3.467* 5.47 4.211** 135 

-•Significant at 0.05 level (two-tailed test). 



TABLE D- VIII ( 1) 

THE PERFORMANCE OF 137 MATCHED PAIRS OF STUDENTS ON THE COOPERATIVE ENGLISH TESTS: ENGLISH EXPRESSION IN 

SEPTEMBER 1964 AND MAY 1965; INCLUDING DIFFERENCES IN MEANS, STANDARD DEVIATIONS, 
t-RATIOS , AND COMPARISONS BETWEEN THE SEXES: UNIVERSITY 1 
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TABLE D-XV(l) 

THE PERFORMANCE OF 43 MATCHED PAIRS OF STUDENTS ON COOPERATIVE ENGLISH TESTS: 
ENGLISH EXPRESSION IN SEPTEMBER 1964, JANUARY 1965, MAY 1965, AND MAY 1966; INCLUDIN' 

DIFFERENCES IN MEANS, STANDARD DEVIATIONS, AND t-RATIOS: UNIVERSITY 1 \ 





Cooperative English 


• 




Dlff. in 






Test Converted Score 


Dlff. in 




Jan. Means, 




Sub- 


Sept. 1964 


Jan. 1965 


Means, Jan. 


t -Ratio 


Cont. minus 


t-Ratio 


BUPilP. 


Mean S. D. 


Mean S . D . 


minus Sept. 


r df-42 


Exper . 


r df*42 


Exper. 


162.91 6.76 


164.56 6.05 


1.65 


0.74 2.274* 


2.40 


0.65 2.795* 


Control 


162.30 6.78 


166.95 7.09 
*• 


4.65 


0.71 5.730* 








Cooperative English 






Dlff. in 






Test Converted Score 


Dlff. in 




May Means, 




Sub- 


Jan. 1965 


May 1965 


Means, May 




Cent, minus 




group 


Mean S. D. 


Mean S. D. 


minus Jan. 


r t -Ratio 


Exper . 


r t-F.atio 


Exper. 


164.56 6.05 


166.63 6.30 


2.07 


0.64 2.555* 


1.07 


0.41 0.945 


Control 


166.95 7.09 


167.70 7.10 


0.74 


0.72 0.908 








Cooperative English 






Dlff. in 






Test Converted Score 


Dlff. in 




May Means, 




Sub- 


Sept. 1964 


May 1965 


Means, May 




Cont. minus 




JB£2HR 


Mean S. D. 


Mean S . D . 


minus Sept. 


r t-Ratio 


Exper . 


r t-Ratio 


Exper. 


162.91 6*76 


166.63 6.30 


3.72 


0.70 4.771* 


Same as 


above 


Control 


162.30 6.78 


167.70 7.10 


5.40 


0.75 7.048* 








Cooperative English 






Dlff. in 






Test Converted Score 


Dlff. in 




May 1966 Means 


Sub- 


May 1965 


May 1966 


Means, May 




Cont. minus 




grOUP 


Mean S. D. 


Mean S. D. 


minus May 


r t-Ratio 


Exper . 


r t-Ratio 


Exper. 


166.63 6.30 


167.88 7.52 


1.26 


-0.06 0.805 


0.58 


-0.13 0.308 


Control 


167.70 7.10 


168.47 8.73 


0.77 


0.07 0.458 







Cooperative English Dlff. In 

Test Converted Score Dlff. In May 1966 Means 

Sub- Sept. 1964 May 1966 Means, May Cont. minus 

group Mean S. D. Mean S. D. minus Sept. _r t -Ratio Exper . r_ t -Ratio 



Exper. 162.91 6.76 
Control 162.30 6.78 



167.88 7.52 4.98 

168.47 8.73 6*16 



0.03 3.236* 
0.29 4.250* 



Same as above 



*Slgniflcant at 0.05 level (two-tailed test). 
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TABLE D-XVI(I) 

THE PERFORMANCE OF 43 MATCHED PAIRS OF STUDENTS ON THE COLLEGE ENTRANCE EXAMINATION 
BOARD ENGLISH COMPOSITION TEST IN SEPTEMBER 1964, JANUARY 1965, MAY 1965, AND MAY 1966 
INCLUDING DIFFERENCES IN MEANS, STANDARD DEVIATIONS, AND t-RATIOS: UNIVERSITY 1 



College Entrance Exam. 



Diff. in 





Board Stan,, 


Rating 


Diff. in 






Jan. Means, 




Sub- 


Sept, 1964 


Jan. 1965 


Means, Jan. 


► 


t-Ratio 


Cont. minus 


t-Ratio 


gr oup 


Mean S. D. 


Mean S. D, 


minus Sept, 


L r 


df =42 


Exper. r 


df a 42 


Exper. 


473.30 80.17 


524.05 74.83 


50.74 


0,66 


5. 108* 


- 5.21 0.40 


0.384 


Control 


480.86 80.05 


518.84 84.42 
%♦ 


37.98 


0.51 


3.013* 








College Entrance Exam, 








Diff. in 




/ 


Board Stan. 


Rating 


Diff. in 






May Means, 




Sub- 


Jan. 1965 


May 1965 


Means, May 






Cont. minus 




group 


Mean S, D. 


Mean S. D. 


minus Jan. 


r 


t-Ratio Exper. r 


t-Ratio 


Exper. 


524.05 74.83 


523.63 67.17 


- 0.42 


0.59 


0.042 


26.79 0.36 


2.049* 


Control 


518.84 84.42 


550.42 81.01 


31.58 


0.66 


2.990* 








College Entrance Exam, 








Diff. in 






Board Stan. 


Rating 


Diff. in 






May Means, 




Sub- 


Sept. 1964 


May 1965 


Means, May 






Cont. minus 




group 


Mean S. D. 


Mean S. D. 


minus Sept, 


, r 


t-Ratio 


Exper. r 


t-Ratio 


Exper. 


473.30 80.17 


523.63 67.17 


50.33 


0.56 


4,655* 


















Same as above 


Control 


480.86 80.05 


550.42 81.01 


69.56 


0.64 


6.581* 








College Entrance Exam, 








Diff. in 






Board Stan. 


Rating 


Diff. in 






May 1966 Means, 




Sub- 


May 1965 


May 1966 


Means, May. 






Cont. minus 




group 


Mean S. D. 


Mean 3. D. 


minus May 


r 


t-Ratio 


Exper . r 


t-Ratio 


Exper. 


523.63 67.17 


536.91 79.76 


13.28 


0.61 


1.308 


16.65 0.54 


1.482 


Control 


550.42 81.01 


553.56 71.17 


3.14 


0.70 


0.340 








College Entrance Exam. 








Diff. in 






Board Stan. 


Rating 


Diff. in 






May 1966 Means, 




Sub- 


Sept. 1964 


May 1966 


Means, May 






Cont. minus 




group 


Mean S. D. 


Mean S. D. 


minus Sept. 


r 


t-Ratio Exper. r 


t-Ratio 


Exper. 


473.30 80.17 


536.91 79.76 


63.60 


0.59 


5.717* 


















Same as above 


Control 480.86 80.05 


553.56 71.17 


72.70 


0.59 


6.803* 







* Significant at 0.05 level (two-tailed test). 
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May 1966 23 548.4 8 75.01 106.74 14.77 " 9.87 2.05 

^■•Combination of Cooperative English Tests: English Expression and College Fiitrance Examination Board English 

Composition Test, September scores. See page 39. 
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JANUARY 1965 AND MAY 1965; INCLUDING DIFFERENCES IN MEANS, STANDARD DEVIATIONS, 
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•“Significant at 0.05 level (two-tailed test) 
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TABLE D-XV(2) 



THE PERFORMANCE OF 23 MATCHED PAIRS OF STUDENTS ON COOPERATIVE ENGLISH TESTS: 
ENGLISH EXPRESSION IN SEPTEMBER 1964, JANUARY 1965, MY 1965, AND MAY 1966; INCLUDING 
DIFFERENCES IN MEANS, STANDARD DEVIATIONS, AND t-EATIOS: UNIVERSITY 2 



Sub- 

group 

Exper. 


Cooperative English 
Test Converted Score 
Sept. 1964 Jan, 1965 

Mean S. D. Mean S. D. 


Diff. in 
Means, Jan. 
minus Sept. 


r 


t-Ratio 

(11*22 


Diff. in 
Jan. Means, 
Cont. minus 
Exper. r 


t-Ratio 

df =22 


166.35 6.60 


168.74 7.66 


2.39 


0.80 


2 .421;:' 


0.87 0.54 


0.605 


Control 


166.09 6.74 


169.61 6.20 

** 


3.52 


0.75 


3 . 558'"- 






Sub- 

go HE 


Cooperative English 
Test Converted Score 
Jan. 1965 May 1965 

Mean S. D. Mean S. D. 


Diff. in 
Means, May 
minus Jan. 


r 


t-Ratio 


Diff. in 
May Means, 
Cont. minus 
Exper. r 


t-Ratio 


Exper. 


168.74 7.66 


169.48 7.13 


0.74 


0.67 


0.579 


1.35 0.65 


1.147 


Control 


169.61 6.20 


170.83 5.71 


1.22 


0.84 


1.667 







Cooperative English Diff. in 

Test Converted Score Diff. in May Means, 

Sub- Sept. 1964 May 1965 Means, May Cont. minus 

group Mean S. D. Mean S. D. minus Sept, r t-Ratio Exper. _ _ r t-Ratio 



Exper. 


166.35 6.60 


169.48 7.13 


3.13 


0.90 


4.657-::- 


Same 


as above 


Control 


166.09 6.74 


170.83 5.71 


4.74 


0.76 


5 . 0 57-"- 






Sub- 

group 


Cooperative English 
Test Converted Score 
May 1965 May 1906 

Mean S. D. Mean S. D. 


Diff. in 
Means, M a y 
minus May 


r 


Diff. in 
May 1966 Means 
Cont. minus 

t-Ratio Exper. r t-Ratio 


Exper. 


169.48 7.13 


171.00 8.11 


1.52 


0.88 


1.836 


0.30 


0.51 0.200 


Control 


170.83 5.71 


171.30 5.55 


0.48 


0.88 


0.812 






Sub- 

group 


Cooperative English 
Test Converted Score 
Sept. 1964 May 1966 

Mean S. D. Mean S. D. 


Diff. in 
Means, May 
minus Sept, 


. r 


t-Ratio 


Diff. in 
May 1966 Means 
Cont. minus 

Exper. r t-Ratio 


Exper. 


166.35 6.60 


171.00 8.11 


4.65 


0.87 


5.374-::- 


Same 


as above 


Control 


166.09 6.74 


171.30 5.55 


5.22 


0.69 


4.960-::- 







•''Significant at 0.05 level (two-tailed test). 
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TABLE B“XVI( 2) 



THE PERFORMANCE OF 23 HATCHED PAIRS OF STUDENTS ON THE COLLEGE ENTRANCE EXAMINATION 
BOARD ENGLISH COMPOSITION TEST IN SEPTEMBER 1964, JANUARY 1965, MAY 1965, AND MAY 1966; 
INCLUDING DIFFERENCES IN MEANS, STANDARD DEVIATIONS, AND 1-RATIOS : UNIVERSITY 2 

College Entrance Exam. Diff. in 





Board 


Stan. 


Rating 




Diff. in. 


Jan. Means, 
t-R.atio Cont. minus 




Sub- 


Sept. 


1964 


Jan. 


1965 


Means, Jan. 


t-Ratio 


group 


Mean 


S. D. 


Mean 


S. D. 


minus Sept, r 


df -22 Exper . r 


df =22 


Exper. 


547.04 


74.24 


542.09 


81.08 


- 4.96 0.84 


0.525 
















-15.22 0.44 


0.785 


Control 


548.48 


75.01 


526.87 


89.73 


-21.61 0.49 


1.207 





Sub- 

group 


College Entrance Exam. 
Board Stan. Rating 
Jan. 1965 May 1965 

Mean S. D. Mean S. D. 


Diff. in 
Means, May 
minus Jan. 


r 


Diff. in 
May Means, 

Cont. minus 

t-Ratio Exper. r t-Ratio 


Exper. 


542.09 81.08 


545.69 94.17 


3.61 


0.76 


0.274 


Control 


526.87 89.73 


552.30 77.89 


25.43 


0.62 


6.61 0.37 0.319 

1.611 



Sub- 

group 


College Entrance Exam. 
Board Stan. Rating 
Sept. 1964 May 1965 

Mean S. D. Mean S. D. 


Diff. in 
Means, May 
minus Sept. 


r 


t-Ratio 


Diff. in 
May Means, 

Cont. minus 

Exper. r t-Ratio 


Exper. 


547.04 74.24 


545.69 94.17 


- 1.35 


0.72 


0.097 




Control 


548.48 75.01 


552.30 77.89 


3.83 


0.56 


0.250 


Same as above 





College Entrance Exam. 








Diff. in 






Board Stan. 


Rating 


Diff. in 






May 1966 


Means, 


Sub- 


May 1965 


May 1966 


Means, May 






Cont. minus 


group 


Mean S. D. 


Mean S. D. 


minus May 


r 


t-Ratio 


Exper. 


r t-Ratio 


Exper. 


545.69 94.17 


572.22 75.43 


26.52 


0.74 


1.968 


-13.61 


0.59 0.940 


Control 


552.30 77.89 


558.61 75.24 


6.30 


0.64 


0.453 







Sub- 

group 


College Entrance Exam. 
Board Stan. Rating 
Sept. 1964 May 1966 

Mean S. D. Mean S. D. 


Diff. in 
Means, May 
minus Sept. 


V 

X 


Diff. in 

May 1966 Means, 

Cont. minus 

t-Ratio Exper. r t-Ratio 


Exper. 


547.04 


74.24 


572.22 75.43 


25.17 


0.74 


2.196* 


















Same 


as 


above 


Control 


548.48 


75.01 


558.61 75.24 


10.13 


0.67 


0.784 







•^■Significant at 0.05 level (two-tailed test). 
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*Significant at 0.05 level (two-tailed test). 
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TABLE D-XV (3) 



THE PERFORMANCE OF 11 MATCHED PAIRS OF STUDENTS ON COOPERATIVE ENGLISH TESTS: 
ENGLISH EXPRESSION IN SEPTEMBER 1964, JANUARY 1965, MAY 1965, AND MAY 1966; INCLUDING 





DIFFERENCES 


IN MEANS, STANDARD DEVIATIONS, AND t-RATIOS: UNIVERSITY 3 




Cooperative English 






Diff. in 




Test Converted Score 


Diff. in 




Jan. Means, 


Sub- 


Sept. 1964 


Jan. 1964 


Means, Jan. 


t-Ratio 


Cont. minus t-Ratio 


group 


Mean S. D. 


Mean S . D . 


minus Sept. 


r df =10 


Exper. r df=10 


Exper . 


167.18 6.04 


167.82 7.07 


0.64 


0.81 0.481 


0.00 0.51 0.000 


Control 


166.73 7.11 


167.82 7.20 


1.09 


0.66 0.583 





Sub- 

group 

Exper . 

Control 


Cooperative English 
Test Converted Score 
Jan. 1965 May 1965 

Mean S. D. Mean S. D. 

167.82 7.07 170.36 7.92 

167.82 7.20 168.91 7.33 


Diff. in 
Means, May 
minus Jan. 

2.55 

1.09 


r t-Ratio 
0.77 1.566 
0.71 0.629 


Diff. in 
May Means, 

Cont. minus 

Exper. r t-Ratio 

-1.45 0.59 0.665 




Cooperative English 






Diff. in 




Test Converted Score 


Diff. in 




May Means, 


Sub- 


Sept. 1964 


May 1965 


Means, May 




Cont. minus 


group 


Mean S . D . 


Mean S. D. 


minus Sept. 


r t-Ratio Exper. r t-Ratio 


Exper. 


167.18 6.04 


170.36 7.92 


3.18 


0.79 2.079 














Same as above 


Control 


166.73 7.11 


168.91 7.33 


2.18 


0.78 1.426 







Cooperative English 






Diff. in 




Test Converted Score 


Diff. in 




May 1966 Means, 


Sub- 


May 1965 


May 1966 


Means, May 




Cont. minus 


group 


Mean S . D . 


Mean S . D . 


minus May 


r t-Ratio Exper. r t-Ratio 


Exper. 


170.36 7.92 


166.82 9.59 


-3.55 


0.21 1.010 


3.73 0.30 1.179 


Control 


168.91 7.33 


170.55 6.85 


1.64 


0.56 0.776 







Cooperative English 






Diff. in 




Test Converted Score . 


Diff. in 




May 1966 Means, 


Sub- 


f 'ept. 1964 


May 1966 


Means, May 




Cont. minus 


group 


Mean S. D. 


Mean S. D. 


minus Sept. 


r t-Ratio Exper. r t-Ratio 


Exper. 


167.18 6.04 


166.82 9.59 


-0.36 


0.27 0.116 


Same as above 


Control 


166.73 7.11 


170.55 6.85 


3.82 


0.62 1.993 
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TABLE D-XVI(3) 



THE PERFORMANCE OF 11 MATCHED PAIRS OF STUDENTS ON THE COLLEGE ENTRANCE EXAMINATION 
BOARD ENGLISH COMPOSITION TEST IN SEPTEMBER 1964, JANUARY 1965, MAY 1965, AND MAY 1966: 
INCLUDING DIFFERENCES IN MEANS, STANDARD DEVIATIONS, AND t-RATIOS: UNIVERSITY 3 

College Entrance Exam. Diff. in 





Board Stan. 


Rating 


Diff. in 




Jan. Means, 


Sub- 


Sept. 1964 


Jan. 1965 


Means, Jan. 


t-Ratio 


Cont. minus t-Ratio 


group 


Mean S. D. 


Mean S. D. 


minus Sept, r 


df-10 


Exper. r df =10 


Exper . 


496.91 85.62 


530.36 95.08 


33.45 0.65 


1.399 


-19.91 0.43 0,698 


Control 497.91 82.82 


510.45 69.10 


12.55 0.66 


0.623 






College Entrance Exam. 




k 


Diff. in 




Board Stan. 


Rating 


Diff, in 




May Means, 


Sub- 


Jan. 1965 


May 1965 


Means, May 




Cont. minus 


group 


Mean S. D. 


Mean S. D. 


minus Jan. r 


t-Ratio 


Exper. r t-Ratio 


Exper . 


530.36 95.08 


520.73 61.78 


- 9.64 0.95 


0.728 


-13.18 0.65 0.742 


Control 


510.45 69.10 


507.55 71.08 


- 2.91 0.66 


0.160 






College Entrance Exam. 






Diff. in 




Board Stan. 


Rating 


Diff. in 




May Means, 


Sub- 


Sept. 1964 


May 1965 


Means, May 




Cont. minus 


group 


Mean S. i). 


Mean S. D. 


minus Sept, r 


t-Ratio Exper. r t-Ratio 


Exper. 


496.91 85.62 


520.73 61.78 


23.82 0.58 


1.059 


Same as above 


Control 


497.91 82.82 


507.55 71.08 


9.64 0.92 


0.932 






College Entrance Exam. 






Diff. in 




Board Stan. 


Rating 


Diff. in 




May 1966 Means, 


Sub- 


May 1965 


May 1966 


Means, May 




Cont. minus 


group 


. Mean S. D. 


Mean S. D. 


minus May r 


t-Ratio Exper. r t-Ratio 


Exper. 


520.73 61.78 


567.55 94.07 


46.82 0.76 


2.389* 


- 8.73 0.71 0.406 


Control 


507.55 71.08 


558.82 79.13 


51.27 0.73 


2.918* 






College Entrance Exam. 






Diff. in 




Board Stan. 


Rating 


Diff. in 




May 1966 Means, 


Sub- 


Sept. 1964 


May 1966 


Means, May 




Cont. minus 


group 


Mean S. D. 


Mean S. D. 


minus Sept, r 


t-Ratio 


Exper. r t-Ratio 


Exper. 


496.91 85.62 


567.55 94.07 


70.64 0.84 


4.317* 


Same as above 


Control 


497.91 82.82 


558.82 79.13 


60.91 0.72 


3.202* 





* Significant at 0.05 level (two-tailed test). 
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AND OF VARIOUS PERSISTING PORTIONS OF THAT SAMPLE: UNIVERSITY 4 







•o 

0 ) 
















X -*-» 


• 


lO 


b- 


00 




b- 


00 


V) u 


P 


o 


O 


o 


to 


O 


cn 


•H Q) 




• 


• 


• 


• 


• 


• 


r -4 > 
tO 0 


• 

to 


o 


CO 


cn 


00 


00 


b- 


a o 


u o 




TT 


t- 


CVJ 


H 


CO 


O 




Q> 


CO 


t- 


rH 


00 


00 


00 


ft* • 


U Cl 


1 • 


• 


• 


• 


• 


• 


O Q* 


o (Q 


o 


o 


O 


o 


o 


o 


O K 


o a» 


CO 


CD 


CO 


CD 


CD 


CO 


OU 


xl 


| rH 


rH 


rH 


rH 


H 


H 



o 

4-» • 


" CD 




CVJ 


in 


CO 


00 


•H Q) Q 




b- 


t- 


in 


CD 


CVJ 




• 


• 


• 


• 


• 


• 


o o • 


CO 


CO 


CO 


CO 


CO 


CO 














o 

o • 


rH 


Gi 


e'- 


CO 


o 


in 


C Cl 


1 o 


o 


en 


rH 


rH 


c- 


ro cc 1 

O 4-> 

< to 52 


• 


• 


• 


• 


• 


• 


CVJ 


CVJ 


rH 


CVJ 


CVJ 


CVJ 


| 04 


CVJ 


CVJ 


CVJ 


CVJ 


CM 



• 


o 


rH 




O 


rH 


C- 


X 0> Q 


o 


CO 


O 


b- 


OJ 


CO 


m u 


• 


• 


• 


• 


• 


• 


•HO* 
H OW 
bO to 




CO 


** 


CO 


CO 


CO 


e 

w • 


cr 


00 


in 


b- 


o 


in 


a c 


l ° 


o 


o 


tH 


CM 


rH 


E-* (0 (0 


1 • 


• 


• 


• 


• 


• 


O -H O 


rH 


rH 


rH 


rH 


rH 


rH 


< V5 S 


1 CM 


CM 


CM 

i 


CM 


CM 


CM 



& 

C w 
(0 (A 
pi co • 


t- 


b- 


O 


in 


CD 


b- 


rH £3 




in 






cn 


co 


0) O 


• 


• 


• 


• 


• 


• 


iH • 


o 


o 


o 


8 


cn 


o 


•H . W 


CM 


CM 


CM 


rH 


CM 


+» to 

c 

0) • 


CO 


CM 




in 


CM 


o 


O 33 C 


1 CM 


CD 


in 


o 


CM 


in 


U <0 


• 


• 


• 


• 


• 


• 


o col 


rH 


O 


rH 


o 


CM 


o 


ft* *H 55 1 


1 CO 


CD 


CD 


CD 


CD 


CD 



a> 


CM 




in 


CO 


CO 


o 


o 


• 


• 


• 


' • 


• 


. 


u c 


in 


CD 




CO 


CO 


to 

















551 



CD 


CO 


00 


CM 


CM 




CM 


CM 


in 


in 


OJ 


CO 


CD 


CM 


CM 



o 

CM 



•§ 

CO 



W 

oj 

s 

0 ) o 

t fee 

B X 
(0 0 
GO C/5 



M 

3 

rH rH 

a o 
o 

rH Ph 

o 









10 












rH 












O Tf 










CD 


U CD 




CL 




r-t 


• O) 


+* cn 


in 


3 




O 


CL rH 


C rH 


CD 


O 


rH 


O 

ft* 


& u 


O 

o u 


exen 

3 rH 


<*? 


o 




a> 


0 ) 


O 





O rH 


O 


rH 


•o X 


•o x 


<!r & 


rH 


P4 O 


P-* 


O 


o) e 


a> 6 

X <D 


O 


U 




u 


X 0) 


(0 


u 


• 4-> 


• 


4-* 


O -H 


U 4^ 


• 3 


4^ 


cl c 


CL 


C 


■h a. 


4^ a 


a. e 


C 


88 




o 

o 


10 0) 
S3 cn 


££ 




o 

o 



259 



jmmmmmmmmrnm 



miiammmiimmsmmmmmmmmmsmiaammmimmtt 






January 1965 20 45.0 56.00 20.59 21.25 3.63 21.60 3.25 160.55 8.76 
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-Significant at 0.05 level (two-tailed test) 
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Significant at 0.05 level (two-tailed test) 
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* Significant at 0.05 level (two-tailed test) 
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Significant at 0.05 level (two-tailed test) 
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^-Significant at 0.05 level (two-tailed test) 
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TABLE JV-XV(5) 

THE PERFORMANCE OF 45 MATCHED PAIRS OF STUDENTS ON COOPERATIVE ENGLISH TESTS: 
ENGLISH EXPRESSION IN SEPTEMBER 1964, JANUARY 1965, MAY 1965, AND MAY 1966; INCLUDING 
DIFFERENCES IN MEANS, STANDARD DEVIATIONS, AND t-RATIOS: UNIVERSITY 5 



Cooperative English 



Diff. in 





Test Converted Score 


Diff. in 






Jan. Means, 




Sub- 


Sept. 1964 


Jan. 1965 


Means, Jan, 


» 


t-Ratio 


Cent, minus 


t-Ratio 


.^•° u p 


Mean S. D. 


Mean S. D. 


minus Sept, 


, r 


df»44 


Exper. r 


df =44 


Exper. 


161.56 8.00 


163.69 6.88 


2.13 


0.62 


2. 149* 


0.31 0.56 


0.324 


Control 


163.49 7.42 


164.00 6.65 
»♦ 


0.51 


0.65 


0.569 








Cooperative English 








Diff. in 






Test Converted Score 


Diff. in 






May Means, 




Sub- 


Jan. 1965 


May 1965 


Means, May 






Cont. minus 




group 


Mean S. D. 


Mean S. D. 


minus Jan. 


r 


t-Ratio Exper. r 


t-Ratio 


Exper. 


163. 6S 6.88 


167.73 6.43 


4.04 


0.65 


4.789* 


0.96 0.71 


1.268 


Control 


164.00 6.65 


168.69 6.62 


4.69 


0.74 


6.477* 








Cooperative English 








Diff. in 






Test Converted Score 


Diff. in 






May Means, 




Sub- 


Sept. 1964 


May 1965 


Means, May 






Cont. minus 






Mean S. D. 


Mean S. D. 


minus Sept. 


r 


t-Ratio Exper. r 


t-Ratio 


Exper. 


161.56 8.00 


167.73 6.43 


6.18 


0.66 


6.742* 


















Same as above 


Control 


163.49 7.42 


168.69 6.62 


5.20 


0.67 


6.004* 








Cooperative English 








Diff. in 






Test Converted Score 


Diff. in 






May 1966 Means, 




Sub- 


May 1965 


May 1966 


Means, May 






Cont . inus 




group 


Mean S. D. 


Mean S. D. 


minus May 


r 


t-Ratio 


Exper . r 


t-Ratio 


Exper. 


167.73 6.43 


164.80 9.60 


-2.93 


0.68 


2.773* 


1.84 0.38 


1.253 


Control 


168.69 6.62 


166.64 7.76 


-2.04 


0.75 


2.592* 








Cooperative English 








Diff. in 






Test Converted Score 


Diff. in 






May 1966 Means, 




Sub- 


Sept. 1964 


May 1966 


Means, May 






Cont. minus 




group 


Mean S. D. 


Mean S. D. 


minus Sept, 


r 


t-Ratio Exper. r 


t-Ratio 


Exper. 


161.56 8.00 


164.80 9.60 


3.24 


0.63 


2.810* 


















Same as above 


Control 


163.49 7.42 


166.64 7.76 


3.16 


0.72 


3.6 50* 







^Significant at 0.05 level (two-tailed test). 
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TABLE D-XVI(5) 

THE PERFORMANCE OF 45 MATCHED PAIRS OF STUDENTS ON THE COLLEGE ENTRANCE EXAMINATION 
BOARD ENGLISH COMPOSITION TEST IN SEPTEMBER. 1964, JANUARY 1965, MAY 1965, AND MAY 1966; 
INCLUDING DIFFERENCES IN MEANS, STANDARD DEVIATIONS, AND t-RATIOSs UNIVERSITY 5 

College Entrance Exam. Diff. in 





Board Stan. 


Rating 


Diff. in 






Jan. Means, 


Sub* 


Sept. 1964 


Jan. 1965 


Means, Jan, 


> 


t-Ratio 


Cont. minus 


group 


Mean S. 0. 


Mean S. D. 


minus Sept, 


, r 


df«44 


Exper. r t-Ratio 


Exper. 


493.76 85.57 


534.13 83.46 


40.38 


0.70 


4.097* 


-23.58 0.55 1.758 


Control 


475.91 86.00 


510.56 100.68 


34.64 


0.77 


3.552* 






College Entrance Exam. 








Diff. in 




Board Stan. 


Rating 


Diff. in 






May Means, 


Sub- 


Jan. 1965 


May 1965 


Means, May 






Cont. minus 


group 


Mean S. D. 


Mean S. D. 


minus Jan. 


r 


t-Ratio Exper. r t-Ratio 


Exper. 


534.13 83.46 535.13 71.51 


'1.00 


0.59 


0.094 
















-14.07 0.63 1.500 


Control 


510.56 100.68 521.07 73.91 


10.51 


0.77 


1.088 






College Entrance Exam. 








Diff. in 




Board Stan. 


Rating 


Diff. in 






May Means, 


Sub- 


Sept. 1964 


May 1965 


Means, May 






Cont. minus 


group 


Mean S. D. 


Mean S. D. 


minus Sept, 


, r 


t-Ratio 


Exper. r t-Ratio 


Exper . 


493.76 85.57 


535.13 71.51 


41.38 


0.68 4.297* 
















Same as above 


Control 475.91 86.00 


521.07 73.91 


45.16 


0.75 


5.172* 






College Entrance Exam. 








Diff. in 




Board Stan. 


Rating 


Diff. in 






May 1966 Means, 


Sub- 


May 1965 


May 1966 


Means, May 






Cont. minus 


group 


Mean S. D. 


Mean S. D. 


minus May 


r 


t-Ratio 


Exper. r t-Ratio 


Exper. 


535.13 71.51 


559.40 80.01 


24.27 


0.65 


2.533* 


-13.64 0.50 1.138 


Control 


521.07 73.91 


545.76 79.41 


24.69 


0.73 


2.906* 






College Entrance Exam. 








Diff. in 




Board Stan. 


Rating 


Diff. in 






May 1966 Means, 


Sub- 


Sept. 1964 


May 1966 


Means, May 






Cont. minus 


group 


Mean S. D. 


Mean S. D. 


minus Sept. 


r 


t-Ratio 


Exper. r t-Ratio 


Exper. 


493.76 85.57 


559.40 80.01 


65.64 


0.56 


5.598* 


Same as above 


Control 475.91 86.00 


545.76 79.41 


69.84 


0.65 


6.689* 





•^-Significant at 0.05 level (two-tailed test). 
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